Skip to content

fix(java): expose updatedFragmentOffsets on Update operation for RewriteColumns#6748

Open
jerryjch wants to merge 1 commit into
lance-format:mainfrom
jerryjch:update-with-rewrite-columns-fix-follow-up
Open

fix(java): expose updatedFragmentOffsets on Update operation for RewriteColumns#6748
jerryjch wants to merge 1 commit into
lance-format:mainfrom
jerryjch:update-with-rewrite-columns-fix-follow-up

Conversation

@jerryjch

@jerryjch jerryjch commented May 12, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Follow-up to fix: propagate update_columns offsets and partial last_updated for RewriteColumns #6650 (fix: propagate update_columns offsets and partial last_updated for
    RewriteColumns). Required by lance-spark#418.
  • Update.java: adds Map<Long, byte[]> updatedFragmentOffsets field, 7-arg constructor,
    accessor updatedFragmentOffsets(), and Builder.updatedFragmentOffsets(...) setter.
    Defaults to Collections.emptyMap(). Values are portable RoaringBitmap bytes.
  • java/lance-jni/src/transaction.rs — two JNI directions updated: FromJava deserializes
    each byte[] value into a RoaringBitmap and sets updated_fragment_offsets on the Rust
    operation; IntoJava serializes each bitmap to byte[] and populates a HashMap<Long, byte[]>
    passed to the 7-arg Update constructor (previously the field was ignored and the 6-arg
    form was used).
  • Update.java equals / hashCode: deep-compares byte[] values by content; hashCode
    added per the Java contract.

Background

PR #6650 added updated_fragment_offsets on the Rust Operation::Update (proto field 9),
build_manifest partial refresh logic, and FragmentUpdateResult.getUpdatedRowOffsets().
Two gaps remained:

  1. The Java Update class had no field for these offsets and convert_to_rust_operation
    always set updated_fragment_offsets: None, so the lance-spark commit path
    (UpdateColumnsBackfillBatchWrite) had no way to pass offsets to Rust and the partial
    refresh in build_manifest could never activate from a JVM caller.

  2. convert_to_java_operation_inner still used the old 6-arg constructor signature for
    new_object. With the 6-arg constructor removed from Update.java (replaced by the
    7-arg form), any Rust→Java materialization of Operation::Update (e.g. reading back a
    transaction) would fail at runtime with NoSuchMethodError.

Implementation notes

  • Values are portable RoaringBitmap bytes (little-endian, spec-compliant). The JNI boundary
    stays O(bitmap size) rather than O(n matched rows).
  • with_local_frame(4, ..) per bitmap entry in IntoJava bounds local-ref growth on large
    offset maps. JMap was avoided inside the frame because it holds a JObject with the
    outer frame's lifetime, causing borrow-checker conflicts; call_method on the outer
    java_map reference is used instead.
  • The Vec<u8> buffer for each bitmap is allocated in Rust before entering the frame, so
    its lifetime is independent of JNI frame scope.
  • UpdatedFragmentOffsets added to the lance::dataset::transaction import.

Why the protobuf field alone is not enough

lance-spark commits by calling CommitBuilder.execute(transaction), which passes the Java
Transaction object to nativeCommitToDataset via JNI. The JNI handler calls
convert_to_rust_transactionconvert_to_rust_operation, which reflects on the Java
Update object to build the Rust Operation::Update struct. The protobuf field (field 9)
is only used when a Transaction is serialized as a proto blob; it has no effect on the
reflection-based JNI path unless the Java Update class exposes the field and the JNI
deserialization reads it.

Test plan

  • UpdateTest#testUpdatedFragmentOffsetsRoundTrip — commits an Update with a non-empty
    updatedFragmentOffsets map through CommitBuilder.execute (exercises the FromJava JNI
    path), reads the transaction back via Dataset.readTransaction() (exercises the IntoJava
    JNI path), and asserts the offsets match. Map value is hardcoded portable RoaringBitmap
    bytes encoding {1, 3, 5}; verified with assertArrayEquals after the round-trip.

Compatibility

  • Additive new API on Update — the updatedFragmentOffsets field did not exist in any
    prior release. The builder setter is optional and defaults to Collections.emptyMap(), so
    existing Update.builder()...build() call sites compile and behave identically.
  • Java — equals / hashCode: equals uses offsetMapsEqual to deep-compare byte[]
    values via Arrays.equals; hashCode is added per the Java contract.
  • JNI — constructor signature: the IntoJava new_object call is updated from the 6-arg to
    the 7-arg form in the same PR. Both files must ship together; within that atomic change
    there is no compatibility gap.
  • Rust / proto: no changes. The updated_fragment_offsets proto field and Rust struct
    field were already added in fix: propagate update_columns offsets and partial last_updated for RewriteColumns #6650.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions github-actions Bot added bug Something isn't working A-java Java bindings + JNI labels May 12, 2026
@jerryjch jerryjch force-pushed the update-with-rewrite-columns-fix-follow-up branch from c116d01 to 456e6dc Compare May 12, 2026 18:57
@jerryjch

Copy link
Copy Markdown
Contributor Author

cc @jackye1995 @hamersaw

@jerryjch jerryjch force-pushed the update-with-rewrite-columns-fix-follow-up branch from cbf1475 to 1884a7d Compare June 10, 2026 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-java Java bindings + JNI bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant