Skip to content

fix: schema-aware Arrow struct accessor for partial nested projections#531

Open
wombatu-kun wants to merge 2 commits into
lance-format:mainfrom
wombatu-kun:fix/499-nested-struct-projection-schema-aware-accessor
Open

fix: schema-aware Arrow struct accessor for partial nested projections#531
wombatu-kun wants to merge 2 commits into
lance-format:mainfrom
wombatu-kun:fix/499-nested-struct-projection-schema-aware-accessor

Conversation

@wombatu-kun

Copy link
Copy Markdown
Contributor

Fixes #499 — partial projection of nested struct children (e.g. SELECT s.b, s.c FROM t over struct<a, b, c, d>) crashed the Lance vectorized reader with UnsupportedOperationException from ArrowVectorAccessor.getLong.

  • Lance's native scan does not push down nested struct projection, so the Arrow StructVector always carries all on-disk children in physical order; LanceStructAccessor was binding by physical Arrow ordinal but Spark's generated projection (and external consumers like Hudi's LanceRecordIterator) index by the pruned schema's ordinal — type mismatch.
  • Adds schema-aware constructors LanceArrowColumnVector(ValueVector, DataType) and LanceStructAccessor(StructVector, StructType) that bind Arrow children to a Spark StructType by name, recursing into nested structs. LanceFragmentColumnarBatchScanner.loadNextBatch threads the input-partition schema through so lance-spark's own scan also uses the schema-aware path.
  • Keeps ReadSchemaNestedStructWidening from fix: widen pruned nested struct schemas to preserve Arrow child ordinals #442 as defense-in-depth for the standard scan path.

@github-actions github-actions Bot added the bug Something isn't working label May 14, 2026
@wombatu-kun wombatu-kun force-pushed the fix/499-nested-struct-projection-schema-aware-accessor branch 2 times, most recently from cac7b86 to 45e6e0f Compare May 25, 2026 03:05
@wombatu-kun wombatu-kun force-pushed the fix/499-nested-struct-projection-schema-aware-accessor branch from 45e6e0f to d6c66eb Compare June 5, 2026 09:33
Vova Kolmakov and others added 2 commits June 14, 2026 08:23
@wombatu-kun wombatu-kun force-pushed the fix/499-nested-struct-projection-schema-aware-accessor branch from d6c66eb to 7b166ca Compare June 14, 2026 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vectorized reader fails on partial nested-struct projection — UnsupportedOperationException in getLong

1 participant