Skip to content

feat: contrib Delta driver-side Rust (log replay + predicate pushdown + JNI) [Delta contrib split, part 3a]#5

Draft
schenksj wants to merge 2 commits into
pr/delta-A2-buildgatefrom
pr/delta-A3a-rust-driver
Draft

feat: contrib Delta driver-side Rust (log replay + predicate pushdown + JNI) [Delta contrib split, part 3a]#5
schenksj wants to merge 2 commits into
pr/delta-A2-buildgatefrom
pr/delta-A3a-rust-driver

Conversation

@schenksj

Copy link
Copy Markdown
Owner

Fork-local review draft (Delta-contrib PR split, part 3a). Base is pr/delta-A2-buildgate so the diff shows only A.3a. Stacks on parts 1 (apache#4700) and 2. Opens upstream once its base chain lands. Tracking umbrella: apache#4366.

What this part is

The Rust driver side of the contrib Delta crate: open a table, replay the Delta log via delta-kernel-rs, push predicates for file skipping, and assemble a DeltaScanTaskList returned to the JVM over JNI. It replaces the build-gate stub crate's deps with the real driver modules + delta_kernel, while the executor-side read path stays deferred — the planner is still the stub returning NotImplemented, so a Delta read falls back to vanilla Spark until part 3b adds kernel_scan/dv_reader + the real planner.

Modules (verbatim from the integration branch, where they pass 54 unit tests)

  • error.rsDeltaError / DeltaResult.
  • engine.rs — delta-kernel engine + object_store config (S3 / Azure / GCS / local) for log replay.
  • predicate.rs — Catalyst → kernel predicate translation for data-skipping.
  • scan.rs — log replay → file list + scan-task assembly.
  • jni.rsNative_planDeltaScan / Native_planDeltaReadSchemas entry points (the JVM Native.scala lands in part 4b).
  • lib.rs — declares the driver modules, keeps the planner stub, drops the deferred module decls.
  • Cargo.toml — only the deps the driver set uses; executor deps arrive in 3b.

Why it's safe / inert

The driver set is self-contained (jni → scan → engine → error, predicate standalone) with zero references to the deferred modules. Core is untouched — the dispatch shim still calls planner::plan_delta_scan (the stub). Default (non-contrib-delta) builds carry zero Delta surface; the gate-verify script confirms it (0 Delta symbols in the default libcomet; the contrib build is now ~11 MB larger with real delta_kernel).

Verification

gated native build, 54 in-crate unit tests (cargo test), default native build unchanged, clippy (both feature states), the gate-verify script, and cargo fmt — all green.


🤖 AI disclosure: this PR was prepared with assistance from Claude Code (Claude Opus 4.8), under the submitter's review and direction.

… + JNI) [Delta contrib split, part 3a]

Part 3a of the Delta Lake contrib PR breakup (tracking: apache#4366). Replaces the build-gate
stub crate's deps with the real driver-side modules and the delta-kernel-rs dependency,
while the executor-side read path stays deferred (the `planner` stub still returns
NotImplemented until part 3b adds `kernel_scan`/`dv_reader` + the real planner).

Driver side (open table, replay log, push predicates, return a DeltaScanTaskList over JNI):
- `error.rs`  - DeltaError / DeltaResult.
- `engine.rs` - delta-kernel engine + object_store config (S3/Azure/GCS/local) for log replay.
- `predicate.rs` - Catalyst -> kernel predicate translation for file skipping.
- `scan.rs`   - log replay -> DeltaFileEntry/DeltaScanPlan; scan-task assembly.
- `jni.rs`    - `Native_planDeltaScan` / `Native_planDeltaReadSchemas` JNI entry points
  (the JVM `Native.scala` that calls them lands in part 4b).
- `lib.rs`    - declares the driver modules + keeps the `planner` stub; drops the
  `dv_reader`/`kernel_scan` module decls (part 3b).
- `Cargo.toml` - only the deps the driver set uses (delta_kernel, object_store, arrow,
  jni, prost, serde_json, url, thiserror, log + jni-bridge); executor deps arrive in 3b.

The driver set is self-contained (`jni -> scan -> engine -> error`, `predicate` standalone);
nothing references the deferred modules. Core is untouched -- the dispatch shim still calls
`planner::plan_delta_scan` (the stub) so a Delta read falls back to vanilla Spark until 3b.
The gate-verify cargo-tree assertion is re-tightened to require `delta_kernel` (now real).

Verification: gated native build, 54 in-crate unit tests (cargo test), default native build
unchanged, clippy (both feature states), gate-verify script, cargo fmt -- all green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t in Debug [credential audit P1/P3, folded into A.3a]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant