Skip to content

feat(java): expose SourceDedupeBehavior in MergeInsertParams#7296

Open
sezruby wants to merge 2 commits into
lance-format:mainfrom
sezruby:feat/java-source-dedupe-behavior
Open

feat(java): expose SourceDedupeBehavior in MergeInsertParams#7296
sezruby wants to merge 2 commits into
lance-format:mainfrom
sezruby:feat/java-source-dedupe-behavior

Conversation

@sezruby

@sezruby sezruby commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

What

Expose the Rust core's source_dedupe_behavior (Fail/FirstSeen) on the Java MergeInsertParams. MergeInsertBuilder already supports it in lance-core, but the Java binding never wired it — Java callers were stuck on the default (Fail) with no way to opt into FirstSeen.

Changes

  • rust/lance/src/dataset.rs — re-export SourceDedupeBehavior from lance::dataset so the JNI crate can import it alongside the other merge types.
  • MergeInsertParams.java — add SourceDedupeBehavior { Fail, FirstSeen } enum, withSourceDedupeBehavior() builder, getters, and toString() entry. Defaults to Fail, matching the Rust default.
  • merge_insert.rs (JNI)extract_source_dedupe_behavior() helper (mirrors extract_when_matched), passed through to the builder.
  • MergeInsertTest.java — cover both enum values across the JNI boundary: FirstSeen keeps the first duplicate source row; Fail errors on duplicate source keys (asserting the "Ambiguous merge inserts are prohibited" cause) and leaves the dataset unchanged.

Why

A merge source with duplicate join keys (e.g. a CDC stream with multiple updates per key) currently fails on the Java path with no recourse. FirstSeen is the documented escape hatch in lance-core; this makes it reachable from Java. Keeps the bindings consistent with the Python/Rust surface.

Test

  • MergeInsertTest: 11/11 pass (9 existing + 2 new)
  • cargo fmt / cargo clippy (lance-jni) clean; cargo check -p lance clean
  • ./mvnw spotless:check clean

🤖 Generated with Claude Code

sezruby and others added 2 commits June 16, 2026 12:50
The Rust core's MergeInsertBuilder supports source_dedupe_behavior
(Fail/FirstSeen) to control how duplicate source rows that match the
same target row are handled, but the Java binding never wired it. Java
callers were stuck on the default (Fail) with no way to opt into
FirstSeen.

Add SourceDedupeBehavior enum + withSourceDedupeBehavior() builder to
the Java MergeInsertParams, pass it through the JNI layer, and
re-export SourceDedupeBehavior from lance::dataset so the binding can
import it alongside the other merge types.

Tests cover both enum values across the JNI boundary: FirstSeen keeps
the first duplicate source row, Fail errors on duplicate source keys
and leaves the dataset unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tighten the Fail test to verify the thrown exception carries the
"Ambiguous merge inserts are prohibited" cause, not just that some
exception is raised. Asserts on the stable message substring only,
excluding the volatile file:line suffix and JNI error-class prefix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added A-java Java bindings + JNI enhancement New feature or request labels Jun 16, 2026
@codecov

codecov Bot commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-java Java bindings + JNI enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant