docs: contrib Delta design docs + user guide [Delta contrib split, part 8]#12
Draft
schenksj wants to merge 2 commits into
Draft
docs: contrib Delta design docs + user guide [Delta contrib split, part 8]#12schenksj wants to merge 2 commits into
schenksj wants to merge 2 commits into
Conversation
schenksj
added a commit
that referenced
this pull request
Jun 22, 2026
51e7158 to
77f9032
Compare
schenksj
added a commit
that referenced
this pull request
Jun 22, 2026
…flaky re-run) + apache#4366 carveout links
f0dcb24 to
c65b636
Compare
77f9032 to
59ff67c
Compare
…rt 8] Adds the Delta contrib documentation: the user-facing guide plus the in-repo design docs. Docs only -- no code. User guide (docs/source/user-guide/latest/): - delta.md (new): how to build with -Pcontrib-delta + --features contrib-delta, the supported Spark/Delta/Scala matrix, usage, the four tuning configs (verified against DeltaConf.scala), supported features, and current limitations. - datasources.md, index.rst: link the new Delta guide into the data-sources page and the user-guide toctree (additive). Design docs (contrib/delta/docs/, 12 files): overview, planning, native execution, design decisions, build/deploy, fallback/ops, Spark 3.5 feasibility, known limitations, plus the iceberg-style kernel-read migration plan and its coherence/elimination audits. These are internal architecture/history docs linked from delta.md via GitHub URLs. Audited every config/class/path/proto reference and every user-facing claim against what actually landed (docs were authored for the integration branch, which this split reconstructs). Accuracy fixes: - delta.md storage: add Azure (abfs/abfss/wasb) and GCS (gs) -- both ship and work via object_store::parse_url (engine.rs); the line previously listed only local / HDFS / S3. - delta.md limitations: the residual S3-credential gap is explicit Hadoop credential-provider classes (AssumedRole/WebIdentity), NOT "per-bucket chains" (per-bucket static keys are handled); add the narrow far-future (~year 2262) INT96-timestamp overflow caveat (delta-kernel gap, A6). - delta.md: Java 17 is required for all Spark 4.x builds (4.0 and 4.1), not just 4.1; note Scala 2.12 is offered only for Spark 3.5. - delta.md usage: clarify the comet-spark jar must be the from-source -Pcontrib-delta build (the published Maven artifact carries no Delta support). - 12-elimination-evaluation.md: the proto kernel_read (field 25) row said "kept"; it is `reserved 25` and planner.rs no longer reads it -- corrected to "removed". - 05-build-and-deploy.md: `cargo build -p comet` -> `-p datafusion-comet`. The four tuning configs (hand-documented because contrib configs are not in the default GenerateDocs output), the version matrix, the native module list, the proto messages, and all inter-doc links verified accurate against HEAD. Docs 10/11/12 are explicitly framed as historical (doc 10 has a "Status: IMPLEMENTED and default" banner). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01BtErWgRQKCDRAg8Mk6qR4G Archived the three point-in-time engineering records (the iceberg-style kernel-read migration plan, its design-coherence audit, and the custom-code elimination evaluation) under contrib/delta/docs/archive/, leaving 01-08 + README as the living doc set. Incoming links (01/03/04/README) repointed to archive/; the README gains an "Archived design history" note. They document how the design was reasoned/pruned, not how the shipped integration works.
…[themeB, folded into A.7]
c65b636 to
6093760
Compare
59ff67c to
cf4f63a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part 8 of the Delta Lake contrib PR breakup (stacked on part 7 / #11). Fork-local review draft. Docs only — no code.
What this adds
docs/source/user-guide/latest/delta.md(new) — the user-facing guide: build instructions (-Pcontrib-delta+--features contrib-delta), the supported Spark/Delta/Scala matrix, a usage example, the four tuning configs (verified againstDeltaConf.scala), supported features, and current limitations.datasources.md/index.rst— link the Delta guide into the data-sources page and the user-guide toctree (additive).contrib/delta/docs/(12 files) — internal architecture/history docs: overview, planning, native execution, design decisions, build/deploy, fallback/ops, Spark 3.5 feasibility, known limitations, and the iceberg-style kernel-read migration plan + its coherence/elimination audits.Audited against landed code (not just carried over)
The docs were authored for the integration branch (which this split reconstructs), so I cross-checked every config/class/path/proto reference and every user-facing claim against
HEAD. Accuracy fixes made:abfs/abfss/wasb) and GCS (gs) — both ship and work viaobject_store::parse_url(engine.rs); the line previously listed only local/HDFS/S3.fs.s3a.aws.credentials.provider, e.g. AssumedRole/WebIdentity) — not "per-bucket chains" (per-bucket static keys are handled).comet-sparkjar must be the from-source-Pcontrib-deltabuild (the published Maven artifact carries no Delta support).12-elimination-evaluation.md: the protokernel_read(field 25) row said "kept" — it isreserved 25andplanner.rsno longer reads it → corrected to "removed".05-build-and-deploy.md:cargo build -p comet→-p datafusion-comet.Verified accurate (no change needed)
The four tuning configs + defaults (hand-documented because contrib configs are not in the default
GenerateDocsoutput), the version matrix, the native module list, the proto messages, and all inter-doc links. Docs 10/11/12 are explicitly framed as historical (doc 10 carries a "Status: IMPLEMENTED and default" banner).🤖 This PR was prepared with the assistance of Claude (Anthropic). A human author reviewed and is responsible for the content.