Skip to content

docs: contrib Delta design docs + user guide [Delta contrib split, part 8]#12

Draft
schenksj wants to merge 2 commits into
pr/delta-A6b-regressionfrom
pr/delta-A7-docs
Draft

docs: contrib Delta design docs + user guide [Delta contrib split, part 8]#12
schenksj wants to merge 2 commits into
pr/delta-A6b-regressionfrom
pr/delta-A7-docs

Conversation

@schenksj

@schenksj schenksj commented Jun 22, 2026

Copy link
Copy Markdown
Owner

Part 8 of the Delta Lake contrib PR breakup (stacked on part 7 / #11). Fork-local review draft. Docs only — no code.

What this adds

  • docs/source/user-guide/latest/delta.md (new) — the user-facing guide: build instructions (-Pcontrib-delta + --features contrib-delta), the supported Spark/Delta/Scala matrix, a usage example, the four tuning configs (verified against DeltaConf.scala), supported features, and current limitations.
  • datasources.md / index.rst — link the Delta guide into the data-sources page and the user-guide toctree (additive).
  • contrib/delta/docs/ (12 files) — internal architecture/history docs: overview, planning, native execution, design decisions, build/deploy, fallback/ops, Spark 3.5 feasibility, known limitations, and the iceberg-style kernel-read migration plan + its coherence/elimination audits.

Audited against landed code (not just carried over)

The docs were authored for the integration branch (which this split reconstructs), so I cross-checked every config/class/path/proto reference and every user-facing claim against HEAD. Accuracy fixes made:

  • Storage: added Azure (abfs/abfss/wasb) and GCS (gs) — both ship and work via object_store::parse_url (engine.rs); the line previously listed only local/HDFS/S3.
  • S3 credentials: the real residual gap is explicit Hadoop credential-provider classes (fs.s3a.aws.credentials.provider, e.g. AssumedRole/WebIdentity) — not "per-bucket chains" (per-bucket static keys are handled).
  • INT96: added the narrow far-future (~year 2262) timestamp-overflow caveat (a tracked delta-kernel gap).
  • Java 17: required for all Spark 4.x builds (4.0 and 4.1), not just 4.1; noted Scala 2.12 is offered only for Spark 3.5.
  • Usage: clarified the comet-spark jar must be the from-source -Pcontrib-delta build (the published Maven artifact carries no Delta support).
  • 12-elimination-evaluation.md: the proto kernel_read (field 25) row said "kept" — it is reserved 25 and planner.rs no longer reads it → corrected to "removed".
  • 05-build-and-deploy.md: cargo build -p comet-p datafusion-comet.

Verified accurate (no change needed)

The four tuning configs + defaults (hand-documented because contrib configs are not in the default GenerateDocs output), the version matrix, the native module list, the proto messages, and all inter-doc links. Docs 10/11/12 are explicitly framed as historical (doc 10 carries a "Status: IMPLEMENTED and default" banner).

Docs 10/11/12 (migration plan / coherence audit / elimination evaluation) are development-history artifacts and have been moved to contrib/delta/docs/archive/ (incoming links repointed; README notes the archive). 01-08 + README are the living doc set.


🤖 This PR was prepared with the assistance of Claude (Anthropic). A human author reviewed and is responsible for the content.

schenksj added 2 commits June 29, 2026 09:27
…rt 8]

Adds the Delta contrib documentation: the user-facing guide plus the in-repo
design docs. Docs only -- no code.

User guide (docs/source/user-guide/latest/):
- delta.md (new): how to build with -Pcontrib-delta + --features contrib-delta,
  the supported Spark/Delta/Scala matrix, usage, the four tuning configs (verified
  against DeltaConf.scala), supported features, and current limitations.
- datasources.md, index.rst: link the new Delta guide into the data-sources page
  and the user-guide toctree (additive).

Design docs (contrib/delta/docs/, 12 files): overview, planning, native execution,
design decisions, build/deploy, fallback/ops, Spark 3.5 feasibility, known
limitations, plus the iceberg-style kernel-read migration plan and its
coherence/elimination audits. These are internal architecture/history docs linked
from delta.md via GitHub URLs.

Audited every config/class/path/proto reference and every user-facing claim against
what actually landed (docs were authored for the integration branch, which this
split reconstructs). Accuracy fixes:
- delta.md storage: add Azure (abfs/abfss/wasb) and GCS (gs) -- both ship and work
  via object_store::parse_url (engine.rs); the line previously listed only local /
  HDFS / S3.
- delta.md limitations: the residual S3-credential gap is explicit Hadoop
  credential-provider classes (AssumedRole/WebIdentity), NOT "per-bucket chains"
  (per-bucket static keys are handled); add the narrow far-future (~year 2262)
  INT96-timestamp overflow caveat (delta-kernel gap, A6).
- delta.md: Java 17 is required for all Spark 4.x builds (4.0 and 4.1), not just
  4.1; note Scala 2.12 is offered only for Spark 3.5.
- delta.md usage: clarify the comet-spark jar must be the from-source -Pcontrib-delta
  build (the published Maven artifact carries no Delta support).
- 12-elimination-evaluation.md: the proto kernel_read (field 25) row said "kept";
  it is `reserved 25` and planner.rs no longer reads it -- corrected to "removed".
- 05-build-and-deploy.md: `cargo build -p comet` -> `-p datafusion-comet`.

The four tuning configs (hand-documented because contrib configs are not in the
default GenerateDocs output), the version matrix, the native module list, the proto
messages, and all inter-doc links verified accurate against HEAD. Docs 10/11/12 are
explicitly framed as historical (doc 10 has a "Status: IMPLEMENTED and default"
banner).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BtErWgRQKCDRAg8Mk6qR4G

Archived the three point-in-time engineering records (the iceberg-style kernel-read
migration plan, its design-coherence audit, and the custom-code elimination
evaluation) under contrib/delta/docs/archive/, leaving 01-08 + README as the living
doc set. Incoming links (01/03/04/README) repointed to archive/; the README gains an
"Archived design history" note. They document how the design was reasoned/pruned, not
how the shipped integration works.
@schenksj schenksj force-pushed the pr/delta-A6b-regression branch from c65b636 to 6093760 Compare June 29, 2026 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant