fix(java): expose updatedFragmentOffsets on Update operation for RewriteColumns#6748
Open
jerryjch wants to merge 1 commit into
Open
fix(java): expose updatedFragmentOffsets on Update operation for RewriteColumns#6748jerryjch wants to merge 1 commit into
jerryjch wants to merge 1 commit into
Conversation
c116d01 to
456e6dc
Compare
Contributor
Author
ed07fea to
cbf1475
Compare
…iteColumns commits
cbf1475 to
1884a7d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
RewriteColumns). Required by lance-spark#418.
Update.java: addsMap<Long, byte[]> updatedFragmentOffsetsfield, 7-arg constructor,accessor
updatedFragmentOffsets(), andBuilder.updatedFragmentOffsets(...)setter.Defaults to
Collections.emptyMap(). Values are portable RoaringBitmap bytes.java/lance-jni/src/transaction.rs— two JNI directions updated: FromJava deserializeseach
byte[]value into aRoaringBitmapand setsupdated_fragment_offsetson the Rustoperation; IntoJava serializes each bitmap to
byte[]and populates aHashMap<Long, byte[]>passed to the 7-arg
Updateconstructor (previously the field was ignored and the 6-argform was used).
Update.javaequals/hashCode: deep-comparesbyte[]values by content;hashCodeadded per the Java contract.
Background
PR #6650 added
updated_fragment_offsetson the RustOperation::Update(proto field 9),build_manifestpartial refresh logic, andFragmentUpdateResult.getUpdatedRowOffsets().Two gaps remained:
The Java
Updateclass had no field for these offsets andconvert_to_rust_operationalways set
updated_fragment_offsets: None, so the lance-spark commit path(UpdateColumnsBackfillBatchWrite) had no way to pass offsets to Rust and the partial
refresh in
build_manifestcould never activate from a JVM caller.convert_to_java_operation_innerstill used the old 6-arg constructor signature fornew_object. With the 6-arg constructor removed fromUpdate.java(replaced by the7-arg form), any Rust→Java materialization of
Operation::Update(e.g. reading back atransaction) would fail at runtime with
NoSuchMethodError.Implementation notes
stays O(bitmap size) rather than O(n matched rows).
with_local_frame(4, ..)per bitmap entry in IntoJava bounds local-ref growth on largeoffset maps.
JMapwas avoided inside the frame because it holds aJObjectwith theouter frame's lifetime, causing borrow-checker conflicts;
call_methodon the outerjava_mapreference is used instead.Vec<u8>buffer for each bitmap is allocated in Rust before entering the frame, soits lifetime is independent of JNI frame scope.
UpdatedFragmentOffsetsadded to thelance::dataset::transactionimport.Why the protobuf field alone is not enough
lance-spark commits by calling
CommitBuilder.execute(transaction), which passes the JavaTransactionobject tonativeCommitToDatasetvia JNI. The JNI handler callsconvert_to_rust_transaction→convert_to_rust_operation, which reflects on the JavaUpdateobject to build the RustOperation::Updatestruct. The protobuf field (field 9)is only used when a Transaction is serialized as a proto blob; it has no effect on the
reflection-based JNI path unless the Java
Updateclass exposes the field and the JNIdeserialization reads it.
Test plan
UpdateTest#testUpdatedFragmentOffsetsRoundTrip— commits anUpdatewith a non-emptyupdatedFragmentOffsetsmap throughCommitBuilder.execute(exercises the FromJava JNIpath), reads the transaction back via
Dataset.readTransaction()(exercises the IntoJavaJNI path), and asserts the offsets match. Map value is hardcoded portable RoaringBitmap
bytes encoding {1, 3, 5}; verified with
assertArrayEqualsafter the round-trip.Compatibility
Update— theupdatedFragmentOffsetsfield did not exist in anyprior release. The builder setter is optional and defaults to
Collections.emptyMap(), soexisting
Update.builder()...build()call sites compile and behave identically.equals/hashCode:equalsusesoffsetMapsEqualto deep-comparebyte[]values via
Arrays.equals;hashCodeis added per the Java contract.new_objectcall is updated from the 6-arg tothe 7-arg form in the same PR. Both files must ship together; within that atomic change
there is no compatibility gap.
updated_fragment_offsetsproto field and Rust structfield were already added in fix: propagate update_columns offsets and partial last_updated for RewriteColumns #6650.