feat: contrib Delta driver-side Rust (log replay + predicate pushdown + JNI) [Delta contrib split, part 3a]#5
Draft
schenksj wants to merge 2 commits into
Draft
Conversation
… + JNI) [Delta contrib split, part 3a] Part 3a of the Delta Lake contrib PR breakup (tracking: apache#4366). Replaces the build-gate stub crate's deps with the real driver-side modules and the delta-kernel-rs dependency, while the executor-side read path stays deferred (the `planner` stub still returns NotImplemented until part 3b adds `kernel_scan`/`dv_reader` + the real planner). Driver side (open table, replay log, push predicates, return a DeltaScanTaskList over JNI): - `error.rs` - DeltaError / DeltaResult. - `engine.rs` - delta-kernel engine + object_store config (S3/Azure/GCS/local) for log replay. - `predicate.rs` - Catalyst -> kernel predicate translation for file skipping. - `scan.rs` - log replay -> DeltaFileEntry/DeltaScanPlan; scan-task assembly. - `jni.rs` - `Native_planDeltaScan` / `Native_planDeltaReadSchemas` JNI entry points (the JVM `Native.scala` that calls them lands in part 4b). - `lib.rs` - declares the driver modules + keeps the `planner` stub; drops the `dv_reader`/`kernel_scan` module decls (part 3b). - `Cargo.toml` - only the deps the driver set uses (delta_kernel, object_store, arrow, jni, prost, serde_json, url, thiserror, log + jni-bridge); executor deps arrive in 3b. The driver set is self-contained (`jni -> scan -> engine -> error`, `predicate` standalone); nothing references the deferred modules. Core is untouched -- the dispatch shim still calls `planner::plan_delta_scan` (the stub) so a Delta read falls back to vanilla Spark until 3b. The gate-verify cargo-tree assertion is re-tightened to require `delta_kernel` (now real). Verification: gated native build, 54 in-crate unit tests (cargo test), default native build unchanged, clippy (both feature states), gate-verify script, cargo fmt -- all green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
f2ad00c to
e020fe6
Compare
1a774d0 to
2a9a4c6
Compare
schenksj
added a commit
that referenced
this pull request
Jun 22, 2026
…flaky re-run) + apache#4366 carveout links
…t in Debug [credential audit P1/P3, folded into A.3a]
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this part is
The Rust driver side of the contrib Delta crate: open a table, replay the Delta log via delta-kernel-rs, push predicates for file skipping, and assemble a
DeltaScanTaskListreturned to the JVM over JNI. It replaces the build-gate stub crate's deps with the real driver modules +delta_kernel, while the executor-side read path stays deferred — theplanneris still the stub returningNotImplemented, so a Delta read falls back to vanilla Spark until part 3b addskernel_scan/dv_reader+ the real planner.Modules (verbatim from the integration branch, where they pass 54 unit tests)
error.rs—DeltaError/DeltaResult.engine.rs— delta-kernel engine + object_store config (S3 / Azure / GCS / local) for log replay.predicate.rs— Catalyst → kernel predicate translation for data-skipping.scan.rs— log replay → file list + scan-task assembly.jni.rs—Native_planDeltaScan/Native_planDeltaReadSchemasentry points (the JVMNative.scalalands in part 4b).lib.rs— declares the driver modules, keeps theplannerstub, drops the deferred module decls.Cargo.toml— only the deps the driver set uses; executor deps arrive in 3b.Why it's safe / inert
The driver set is self-contained (
jni → scan → engine → error,predicatestandalone) with zero references to the deferred modules. Core is untouched — the dispatch shim still callsplanner::plan_delta_scan(the stub). Default (non-contrib-delta) builds carry zero Delta surface; the gate-verify script confirms it (0 Delta symbols in the defaultlibcomet; the contrib build is now ~11 MB larger with realdelta_kernel).Verification
gated native build, 54 in-crate unit tests (
cargo test), default native build unchanged, clippy (both feature states), the gate-verify script, andcargo fmt— all green.🤖 AI disclosure: this PR was prepared with assistance from Claude Code (Claude Opus 4.8), under the submitter's review and direction.