feat: contrib Delta Change Data Feed native reads [Delta contrib split, part 5]#9
Draft
schenksj wants to merge 2 commits into
Draft
feat: contrib Delta Change Data Feed native reads [Delta contrib split, part 5]#9schenksj wants to merge 2 commits into
schenksj wants to merge 2 commits into
Conversation
…t, part 5] Part 5 of the Delta Lake contrib PR breakup (tracking: apache#4366). Native Change Data Feed (`readChangeFeed`) reads. A `RowDataSourceScanExec` over Delta's `DeltaCDFRelation` is read natively via delta-kernel's `TableChanges` instead of vanilla Spark. The Rust `TableChanges` path landed in parts 3a/3b; this wires the JVM side, which earlier units deferred: - `CometDeltaCdfScanExec.scala` (new) — the CDF exec. Splits the inclusive version range into N partitions (one `TableChanges` sub-range each, capped by COMET_DELTA_CDF_MAX_PARTITIONS). Implements `CometScanWithPlanData` so the parent native block's findAllPlanData collects its per-partition sub-ranges (DeltaPlanDataInjector splices them) -- without that, every partition read the full feed and rows duplicated N-fold under an `orderBy`. - `CometDeltaNativeScan.convertCdf` — re-added the serde method that builds the CDF scan (deferred from part 4b). Does not reference `ScanImpl` (lives in DeltaScanMetadata) or the orphaned UTF8String/DateTimeUtils imports. - `DeltaIntegration` — re-added the CDF members (`isCdfRelation`, `transformCdf`, cached `convertCdfBinding`) deferred from part 4a's review, inserted before the (untouched) cached `scanHandler`. - `CometExecRule` — re-added the CDF hook `case` arm deferred from part 2, after the marker hook. All core CDF members are inert on default builds (reflective lookups of the absent contrib serde return None -> vanilla Spark CDF read). Verified on Scala 2.12 (Spark 3.4) as well as 2.13. Tests: CometDeltaCdcSuite (3, incl. the orderBy + unix_timestamp case that is the red-green for the N-fold dup) and CometDeltaCdfReflectionReproSuite (3) -- all pass. Verification: gated JVM test-compile, Scala-2.12 core compile, 6 CDF tests, spotless/scalastyle, check-suites, gate-verify (default build still 0 Delta symbols, only the DeltaIntegration bridge). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…anaged [P0+themeB credential, folded into A.5]
8d14cdb to
0ee301f
Compare
b94530b to
fb47ff3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this part is
Native Change Data Feed (
readChangeFeed) reads. ARowDataSourceScanExecover Delta'sDeltaCDFRelationis read natively via delta-kernel'sTableChangesinstead of vanilla Spark. The RustTableChangespath landed in parts 3a/3b; this wires the JVM side that earlier units deferred:CometDeltaCdfScanExec(new) — the CDF exec. Splits the version range into N partitions (oneTableChangessub-range each). ImplementsCometScanWithPlanDataso the parent native block injects per-partition sub-ranges (without it, every partition read the full feed → N-fold row duplication under anorderBy).CometDeltaNativeScan.convertCdf— the serde method that builds the CDF scan (deferred from part 4b).DeltaIntegration— the CDF members (isCdfRelation/transformCdf/ cachedconvertCdfBinding), deferred from part 4a's review.CometExecRule— the CDF hook arm, deferred from part 2.Why it's contained
The CDF members in core (
CometExecRule,DeltaIntegration) are inert on default builds — the reflective lookup of the absent contrib serde returnsNone, so a CDF read falls back to vanilla Spark. Verified on Scala 2.12 (Spark 3.4) as well as 2.13. Default builds carry zero Delta surface (gate-verify confirms 0 Delta symbols).Verification
gated JVM test-compile, Scala-2.12 core compile, 6 CDF tests (
CometDeltaCdcSuite3 incl. theorderBy+unix_timestampcase;CometDeltaCdfReflectionReproSuite3), spotless/scalastyle, check-suites, gate-verify — all green.🤖 AI disclosure: this PR was prepared with assistance from Claude Code (Claude Opus 4.8), under the submitter's review and direction.