Skip to content

feat: contrib Delta Change Data Feed native reads [Delta contrib split, part 5]#9

Draft
schenksj wants to merge 2 commits into
pr/delta-A4b-scala-execfrom
pr/delta-A5-cdf
Draft

feat: contrib Delta Change Data Feed native reads [Delta contrib split, part 5]#9
schenksj wants to merge 2 commits into
pr/delta-A4b-scala-execfrom
pr/delta-A5-cdf

Conversation

@schenksj

Copy link
Copy Markdown
Owner

Fork-local review draft (Delta-contrib PR split, part 5). Base is pr/delta-A4b-scala-exec so the diff shows only A.5. Stacks on parts 1 (apache#4700), 2, 3a, 3b, 4a, 4b. Tracking umbrella: apache#4366.

What this part is

Native Change Data Feed (readChangeFeed) reads. A RowDataSourceScanExec over Delta's DeltaCDFRelation is read natively via delta-kernel's TableChanges instead of vanilla Spark. The Rust TableChanges path landed in parts 3a/3b; this wires the JVM side that earlier units deferred:

  • CometDeltaCdfScanExec (new) — the CDF exec. Splits the version range into N partitions (one TableChanges sub-range each). Implements CometScanWithPlanData so the parent native block injects per-partition sub-ranges (without it, every partition read the full feed → N-fold row duplication under an orderBy).
  • CometDeltaNativeScan.convertCdf — the serde method that builds the CDF scan (deferred from part 4b).
  • DeltaIntegration — the CDF members (isCdfRelation / transformCdf / cached convertCdfBinding), deferred from part 4a's review.
  • CometExecRule — the CDF hook arm, deferred from part 2.

Why it's contained

The CDF members in core (CometExecRule, DeltaIntegration) are inert on default builds — the reflective lookup of the absent contrib serde returns None, so a CDF read falls back to vanilla Spark. Verified on Scala 2.12 (Spark 3.4) as well as 2.13. Default builds carry zero Delta surface (gate-verify confirms 0 Delta symbols).

Verification

gated JVM test-compile, Scala-2.12 core compile, 6 CDF tests (CometDeltaCdcSuite 3 incl. the orderBy + unix_timestamp case; CometDeltaCdfReflectionReproSuite 3), spotless/scalastyle, check-suites, gate-verify — all green.


🤖 AI disclosure: this PR was prepared with assistance from Claude Code (Claude Opus 4.8), under the submitter's review and direction.

schenksj and others added 2 commits June 29, 2026 09:26
…t, part 5]

Part 5 of the Delta Lake contrib PR breakup (tracking: apache#4366). Native Change Data Feed
(`readChangeFeed`) reads. A `RowDataSourceScanExec` over Delta's `DeltaCDFRelation` is read
natively via delta-kernel's `TableChanges` instead of vanilla Spark. The Rust `TableChanges`
path landed in parts 3a/3b; this wires the JVM side, which earlier units deferred:

- `CometDeltaCdfScanExec.scala` (new) — the CDF exec. Splits the inclusive version range into N
  partitions (one `TableChanges` sub-range each, capped by COMET_DELTA_CDF_MAX_PARTITIONS).
  Implements `CometScanWithPlanData` so the parent native block's findAllPlanData collects its
  per-partition sub-ranges (DeltaPlanDataInjector splices them) -- without that, every partition
  read the full feed and rows duplicated N-fold under an `orderBy`.
- `CometDeltaNativeScan.convertCdf` — re-added the serde method that builds the CDF scan (deferred
  from part 4b). Does not reference `ScanImpl` (lives in DeltaScanMetadata) or the orphaned
  UTF8String/DateTimeUtils imports.
- `DeltaIntegration` — re-added the CDF members (`isCdfRelation`, `transformCdf`, cached
  `convertCdfBinding`) deferred from part 4a's review, inserted before the (untouched) cached
  `scanHandler`.
- `CometExecRule` — re-added the CDF hook `case` arm deferred from part 2, after the marker hook.

All core CDF members are inert on default builds (reflective lookups of the absent contrib serde
return None -> vanilla Spark CDF read). Verified on Scala 2.12 (Spark 3.4) as well as 2.13.

Tests: CometDeltaCdcSuite (3, incl. the orderBy + unix_timestamp case that is the red-green for the
N-fold dup) and CometDeltaCdfReflectionReproSuite (3) -- all pass.

Verification: gated JVM test-compile, Scala-2.12 core compile, 6 CDF tests, spotless/scalastyle,
check-suites, gate-verify (default build still 0 Delta symbols, only the DeltaIntegration bridge).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…anaged [P0+themeB credential, folded into A.5]
@schenksj schenksj force-pushed the pr/delta-A4b-scala-exec branch from 8d14cdb to 0ee301f Compare June 29, 2026 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant