Skip to content

test: contrib Delta full test battery + CI workflow [Delta contrib split, part 6]#10

Draft
schenksj wants to merge 2 commits into
pr/delta-A5-cdffrom
pr/delta-A6a-test-battery
Draft

test: contrib Delta full test battery + CI workflow [Delta contrib split, part 6]#10
schenksj wants to merge 2 commits into
pr/delta-A5-cdffrom
pr/delta-A6a-test-battery

Conversation

@schenksj

Copy link
Copy Markdown
Owner

Part 6 of the Delta Lake contrib PR breakup (stacked on part 5 / #9). Fork-local review draft.

Completes the test coverage for the Delta read path landed in parts 1–5. Test-only — no production or native code.

What this adds

  • 23 contrib-delta Scala suites (22 under org.apache.comet.contrib.delta + CometDeltaCheckpointFilterReproSuite under org.apache.spark.sql.delta), carved byte-identical from the integration branch. Behaviour guards: deletion-vector reads, DPP, row tracking, generated-column partition filters, stats skipping, time travel, schema change, nested array/struct, type round-trip, special-char/percent filenames, metadata/credential/filter-pushdown audits.
  • .github/workflows/delta_contrib_test.yml: builds libcomet once with --features contrib-delta, then runs every contrib suite (by package prefix) across (Spark 3.5 + Delta 3.3.2), (Spark 4.0 + Delta 4.0.0), (Spark 4.1 + Delta 4.1.0), plus the build-gate job.
  • dev/ci/check-suites.py: contrib-suite exclusion hoisted ahead of class-name extraction (contrib suites run in their own workflow, exempt from the standard-matrix registration check).

Workflow hardening (review-driven; improves on the integration branch)

  • Pin each cell's exact Spark patch via -Dspark.version=<matrix.full>. The -Pspark-4.1 profile otherwise pulls Spark 4.1.2, which dropped IgnoreCachedData and breaks delta-spark 4.1.0 (needs 4.1.1). The pom stays at 4.1.2 for default users — the pin is CI-only per the part-2 decision.
  • Label the Spark 3.5 cell as Scala 2.12 (its real binary version). It is intentionally the project's only 2.12 coverage — it guards 2.12-specific breakage like the existential-type inference in the core DeltaIntegration bridge that 2.13 accepts but 2.12 rejects.
  • Cache contrib/delta/native/target so the standalone contrib crate's cargo test build is incremental.
  • Silent-green guard: scalatest treats a zero-match wildcardSuites as success, so the job now asserts a floor on the per-suite surefire reports produced.

Removed

.github/workflows/delta_build_gate.yml — the part-2 standalone gate workflow is now subsumed by the byte-identical delta-build-gate job inside delta_contrib_test.yml.

Dropped (deferred %-path change)

The local-path %/space production change is not included: CometDeltaPercentFileNameReproSuite and CometDeltaSpecialCharFilenameSuite both pass without it (object_store round-trips percent-encoded local paths), so it is a confirmed no-op.

Verification

Gated JVM test-compile (all 31 contrib suites); full battery green — 157 succeeded, 0 failed, 1 version-gated cancel across 33 suites on Spark 4.1 + Delta 4.1.0 (the exact -Dspark.version=4.1.1 command this workflow issues); spotless + scalastyle clean; check-suites.py exit 0; dev/verify-contrib-delta-gate.sh all checks pass (default libcomet 0 Delta symbols).


🤖 This PR was prepared with the assistance of Claude (Anthropic). A human author reviewed and is responsible for the content.

schenksj and others added 2 commits June 29, 2026 09:27
…lit, part 6]

Adds the remaining contrib-delta Scala test battery and the dedicated CI
workflow that runs it, completing the test coverage for the Delta read path
landed in parts 1-5.

What this adds (test-only -- no production or native code):
- 23 contrib-delta repro/audit/regression suites (22 under
  org.apache.comet.contrib.delta + CometDeltaCheckpointFilterReproSuite under
  org.apache.spark.sql.delta), copied verbatim from the integration branch.
  These are behaviour guards: deletion-vector reads, DPP, row tracking,
  generated-column partition filters, stats skipping, time travel, schema
  change, nested array/struct, type round-trip, special-char/percent file
  names, metadata/credential/filter-pushdown audits, etc.
- .github/workflows/delta_contrib_test.yml: builds libcomet once with
  --features contrib-delta, then runs every contrib suite (matched by package
  prefix) across (Spark 3.5 + Delta 3.3.2), (Spark 4.0 + Delta 4.0.0) and
  (Spark 4.1 + Delta 4.1.0), plus the build-gate verification job.
- dev/ci/check-suites.py: the contrib-suite exclusion is hoisted ahead of the
  class-name extraction (contrib suites compile only under -Pcontrib-delta and
  run in their own workflow, so they are exempt from the standard-matrix
  registration check).

Workflow hardening (review-driven, improving on the integration branch):
- Pin each cell's exact Spark patch via -Dspark.version=<matrix.full>. Without
  this the -Pspark-4.1 profile pulls Spark 4.1.2, which dropped IgnoreCachedData
  and breaks delta-spark 4.1.0; the contrib needs 4.1.1. (Pom stays at 4.1.2 for
  default users -- the pin is CI-only, per the part-2 decision.)
- Label the Spark 3.5 cell as Scala 2.12 (its real binary version from the
  -Pspark-3.5 profile). It is intentionally the project's only 2.12 coverage --
  it guards 2.12-specific breakage such as the existential-type inference in the
  core DeltaIntegration bridge that 2.13 accepts but 2.12 rejects.
- Cache contrib/delta/native/target so the standalone contrib crate's cargo test
  build is incremental across runs (the crate is outside the native/ workspace).
- Add a silent-green guard: scalatest treats a zero-match wildcardSuites as
  success, so assert a floor on the per-suite surefire reports actually produced.

Removes .github/workflows/delta_build_gate.yml: the minimal standalone gate
workflow from part 2 is now subsumed by the delta-build-gate job inside
delta_contrib_test.yml (byte-identical job), so the full workflow replaces it.

The deferred local-path '%'/space production change is intentionally NOT
included: CometDeltaPercentFileNameReproSuite and CometDeltaSpecialCharFilenameSuite
both pass without it (object_store round-trips percent-encoded local paths),
so the change is a confirmed no-op and is dropped.

Verification: gated JVM test-compile (all 31 contrib suites); full battery
green (157 succeeded, 0 failed, 1 version-gated cancel across 33 suites on
Spark 4.1 + Delta 4.1.0, the cell whose -Dspark.version=4.1.1 command this
workflow now issues); spotless + scalastyle clean; check-suites.py exit 0;
dev/verify-contrib-delta-gate.sh all checks pass (default libcomet 0 Delta
symbols).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01BtErWgRQKCDRAg8Mk6qR4G
@schenksj schenksj force-pushed the pr/delta-A6a-test-battery branch from 9a1ef06 to f514d2c Compare June 29, 2026 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant