Stream entities lazily via Select::cursor()#564
Open
roxblnfk wants to merge 4 commits into
Open
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 2.x #564 +/- ##
============================================
- Coverage 91.66% 91.55% -0.11%
- Complexity 1988 2003 +15
============================================
Files 131 131
Lines 5134 5200 +66
============================================
+ Hits 4706 4761 +55
- Misses 428 439 +11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🔍 What was changed
Adds
Cycle\ORM\Select::cursor(int $chunkSize = 1000, ?CursorOptions $options = null): \Generator<TEntity>for streaming hydrated entities from largeSELECTs without materializing the full result set client-side. Sits on top of the newCycle\Database\Driver\CursorableInterfacefrom cycle/database.New / changed in ORM:
Select::cursor()— yields entities lazily. Opens a DBAL cursor, parses rows chunk-by-chunk, hydrates via the existingIterator+EntityFactoryInterface::make().RootLoader::loadData()split into two public methods reusable by the cursor:parseRows(AbstractNode $node, iterable $rows): void— feeds raw rows through the parser.loadChildren(AbstractNode $node, bool $includeRole): void— runs POSTLOAD relation loaders +loadHierarchy()on the node.RootLoader::getColumnNames(): array— exposes column positions so the cursor can extract the root PK fromFETCH_NUMrows.Algorithm — parent-boundary chunking:
Rather than counting rows, the chunk closes when the root PK changes and the count of distinct parents reaches
$chunkSize. This makes inlineHAS_MANY/MANY_TO_MANYjoins safe: a parent with N child rows is never split across two chunks. To guarantee contiguous parent rows in the stream, the cursor appends the root PK to the query'sORDER BY(PK is unique → never alters existing user-specified order).Per-chunk pipeline:
Heap registration follows normal
make()semantics (identity by PK, fresh row data merges into previously-unresolvedReferencerelations). Heap is not cleaned automatically — caller decides when to$orm->getHeap()->clean()between batches.🤔 Why?
Large exports, migrations, and bulk processors need to walk an entity set without holding all of it in memory and without seeing rows from concurrent writers mid-walk.
Select::fetchAll()/getIterator()buffer the whole result inloadData();Select::limit()/offset()walking is paginated and not snapshot-consistent.Select::cursor()solves both: streams lazily AND inherits snapshot consistency from the underlying DBAL cursor (PGDECLARE NO SCROLL CURSOR, SQLite engine + transaction, MSSQLSTATICcursor). Calling on a driver without cursor support throws a clearDriverExceptionfrom DBAL — no silent fallback that would change consistency semantics.Relation support comes for free thanks to the per-chunk
loadChildrencall: POSTLOAD relations run withWHERE parent_id IN (chunk_ids), inlineHAS_ONE/BELONGS_TOparse alongside the row, andHAS_MANY/M2Minline joins work via parent-boundary chunking. JTI/STI are supported because the parent and subclass joins live on the same row.The signature deliberately keeps
chunkSizeseparate from DBAL's cursor knobs:chunkSizeonSelect::cursor()is "max distinct parents per node flush" (an ORM-layer concept), while driver tuning (PostgresFETCH FORWARD Nsize,WITH HOLD, SQL Server cursor type, named cursors) is forwarded viaCursorOptionssubclasses from cycle/database. Row mode is forced internally toFETCH_NUMbecause the parser expects positional rows.📝 Checklist
CursorableInterfaceto be merged first.Functional tests in
tests/ORM/Functional/Driver/Common/Select/CursorTest.phpcover: in-order streaming, chunk smaller/equal/larger than dataset, empty table, typecast, heap identity,Heap::clean()between chunks bounding memory, transaction requirement,where/orderBy, earlybreakreleasing the cursor, POSTLOADHAS_MANY, inlineHAS_ONE, inlineBELONGS_TO,with()on non-multiplying relations, inlineHAS_MANYwith explicit childorderBy, chunk boundary across a parent with many children, no parent duplication. Driver-specific stubs inPostgres/Select/,SQLite/Select/,SQLServer/Select/. Dedicated JTI and STI cursor tests underPostgres/Inheritance/verifying that subclass instances are returned correctly through the streaming path.📃 Documentation
Docblock on
Select::cursor()describes the chunking semantics (parent-boundary vs row-count), theORDER BYaugmentation rule, requirements (active transaction, cursorable driver), the relation contract (POSTLOAD per chunk; inline relations through parent boundaries), heap identity behavior, and howCursorOptionssubclasses are forwarded to the DBAL layer.