feat(java): expose SourceDedupeBehavior in MergeInsertParams#7296
Open
sezruby wants to merge 2 commits into
Open
feat(java): expose SourceDedupeBehavior in MergeInsertParams#7296sezruby wants to merge 2 commits into
sezruby wants to merge 2 commits into
Conversation
The Rust core's MergeInsertBuilder supports source_dedupe_behavior (Fail/FirstSeen) to control how duplicate source rows that match the same target row are handled, but the Java binding never wired it. Java callers were stuck on the default (Fail) with no way to opt into FirstSeen. Add SourceDedupeBehavior enum + withSourceDedupeBehavior() builder to the Java MergeInsertParams, pass it through the JNI layer, and re-export SourceDedupeBehavior from lance::dataset so the binding can import it alongside the other merge types. Tests cover both enum values across the JNI boundary: FirstSeen keeps the first duplicate source row, Fail errors on duplicate source keys and leaves the dataset unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tighten the Fail test to verify the thrown exception carries the "Ambiguous merge inserts are prohibited" cause, not just that some exception is raised. Asserts on the stable message substring only, excluding the volatile file:line suffix and JNI error-class prefix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Expose the Rust core's
source_dedupe_behavior(Fail/FirstSeen) on the JavaMergeInsertParams.MergeInsertBuilderalready supports it in lance-core, but the Java binding never wired it — Java callers were stuck on the default (Fail) with no way to opt intoFirstSeen.Changes
rust/lance/src/dataset.rs— re-exportSourceDedupeBehaviorfromlance::datasetso the JNI crate can import it alongside the other merge types.MergeInsertParams.java— addSourceDedupeBehavior { Fail, FirstSeen }enum,withSourceDedupeBehavior()builder, getters, andtoString()entry. Defaults toFail, matching the Rust default.merge_insert.rs(JNI) —extract_source_dedupe_behavior()helper (mirrorsextract_when_matched), passed through to the builder.MergeInsertTest.java— cover both enum values across the JNI boundary:FirstSeenkeeps the first duplicate source row;Failerrors on duplicate source keys (asserting the "Ambiguous merge inserts are prohibited" cause) and leaves the dataset unchanged.Why
A merge source with duplicate join keys (e.g. a CDC stream with multiple updates per key) currently fails on the Java path with no recourse.
FirstSeenis the documented escape hatch in lance-core; this makes it reachable from Java. Keeps the bindings consistent with the Python/Rust surface.Test
MergeInsertTest: 11/11 pass (9 existing + 2 new)cargo fmt/cargo clippy(lance-jni) clean;cargo check -p lanceclean./mvnw spotless:checkclean🤖 Generated with Claude Code