Skip to content

Stream entities lazily via Select::cursor()#564

Open
roxblnfk wants to merge 4 commits into
2.xfrom
cursor-loader
Open

Stream entities lazily via Select::cursor()#564
roxblnfk wants to merge 4 commits into
2.xfrom
cursor-loader

Conversation

@roxblnfk
Copy link
Copy Markdown
Member

🔍 What was changed

Adds Cycle\ORM\Select::cursor(int $chunkSize = 1000, ?CursorOptions $options = null): \Generator<TEntity> for streaming hydrated entities from large SELECTs without materializing the full result set client-side. Sits on top of the new Cycle\Database\Driver\CursorableInterface from cycle/database.

New / changed in ORM:

  • Select::cursor() — yields entities lazily. Opens a DBAL cursor, parses rows chunk-by-chunk, hydrates via the existing Iterator + EntityFactoryInterface::make().
  • RootLoader::loadData() split into two public methods reusable by the cursor:
    • parseRows(AbstractNode $node, iterable $rows): void — feeds raw rows through the parser.
    • loadChildren(AbstractNode $node, bool $includeRole): void — runs POSTLOAD relation loaders + loadHierarchy() on the node.
  • RootLoader::getColumnNames(): array — exposes column positions so the cursor can extract the root PK from FETCH_NUM rows.

Algorithm — parent-boundary chunking:

Rather than counting rows, the chunk closes when the root PK changes and the count of distinct parents reaches $chunkSize. This makes inline HAS_MANY / MANY_TO_MANY joins safe: a parent with N child rows is never split across two chunks. To guarantee contiguous parent rows in the stream, the cursor appends the root PK to the query's ORDER BY (PK is unique → never alters existing user-specified order).

Per-chunk pipeline:

read N parents from cursor → fresh RootNode → parseRows → loadChildren (POSTLOAD, JTI/STI merge) → Iterator hydrates → yield

Heap registration follows normal make() semantics (identity by PK, fresh row data merges into previously-unresolved Reference relations). Heap is not cleaned automatically — caller decides when to $orm->getHeap()->clean() between batches.

🤔 Why?

Large exports, migrations, and bulk processors need to walk an entity set without holding all of it in memory and without seeing rows from concurrent writers mid-walk. Select::fetchAll() / getIterator() buffer the whole result in loadData(); Select::limit()/offset() walking is paginated and not snapshot-consistent.

Select::cursor() solves both: streams lazily AND inherits snapshot consistency from the underlying DBAL cursor (PG DECLARE NO SCROLL CURSOR, SQLite engine + transaction, MSSQL STATIC cursor). Calling on a driver without cursor support throws a clear DriverException from DBAL — no silent fallback that would change consistency semantics.

Relation support comes for free thanks to the per-chunk loadChildren call: POSTLOAD relations run with WHERE parent_id IN (chunk_ids), inline HAS_ONE / BELONGS_TO parse alongside the row, and HAS_MANY / M2M inline joins work via parent-boundary chunking. JTI/STI are supported because the parent and subclass joins live on the same row.

The signature deliberately keeps chunkSize separate from DBAL's cursor knobs: chunkSize on Select::cursor() is "max distinct parents per node flush" (an ORM-layer concept), while driver tuning (Postgres FETCH FORWARD N size, WITH HOLD, SQL Server cursor type, named cursors) is forwarded via CursorOptions subclasses from cycle/database. Row mode is forced internally to FETCH_NUM because the parser expects positional rows.

📝 Checklist

  • Closes #
  • Requires the corresponding cycle/database PR introducing CursorableInterface to be merged first.
  • How was this tested:
    • Tested manually
    • Unit tests added

Functional tests in tests/ORM/Functional/Driver/Common/Select/CursorTest.php cover: in-order streaming, chunk smaller/equal/larger than dataset, empty table, typecast, heap identity, Heap::clean() between chunks bounding memory, transaction requirement, where/orderBy, early break releasing the cursor, POSTLOAD HAS_MANY, inline HAS_ONE, inline BELONGS_TO, with() on non-multiplying relations, inline HAS_MANY with explicit child orderBy, chunk boundary across a parent with many children, no parent duplication. Driver-specific stubs in Postgres/Select/, SQLite/Select/, SQLServer/Select/. Dedicated JTI and STI cursor tests under Postgres/Inheritance/ verifying that subclass instances are returned correctly through the streaming path.

📃 Documentation

Docblock on Select::cursor() describes the chunking semantics (parent-boundary vs row-count), the ORDER BY augmentation rule, requirements (active transaction, cursorable driver), the relation contract (POSTLOAD per chunk; inline relations through parent boundaries), heap identity behavior, and how CursorOptions subclasses are forwarded to the DBAL layer.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 83.82353% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.55%. Comparing base (d386f3c) to head (61b6fc9).

Files with missing lines Patch % Lines
src/Select.php 81.66% 11 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##                2.x     #564      +/-   ##
============================================
- Coverage     91.66%   91.55%   -0.11%     
- Complexity     1988     2003      +15     
============================================
  Files           131      131              
  Lines          5134     5200      +66     
============================================
+ Hits           4706     4761      +55     
- Misses          428      439      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant