Skip to content

from_protobuf (part 1): Add ordinal remapping for GetStructField/GetArrayStructFields#14499

Open
thirtiseven wants to merge 3 commits intoNVIDIA:mainfrom
thirtiseven:struct-field-meta-infra
Open

from_protobuf (part 1): Add ordinal remapping for GetStructField/GetArrayStructFields#14499
thirtiseven wants to merge 3 commits intoNVIDIA:mainfrom
thirtiseven:struct-field-meta-infra

Conversation

@thirtiseven
Copy link
Copy Markdown
Collaborator

Part 1 of #14354

Description

This PR:
Introduce PRUNED_ORDINAL_TAG (TreeNodeTag) and named Meta classes that support runtime ordinal remapping for struct field extraction expressions.
This enables future schema projection optimizations (e.g., protobuf nested pruning) to rewrite field ordinals at GPU conversion time without modifying the generic runtime expression classes.

Changes:

  • Add GpuStructFieldOrdinalTag with PRUNED_ORDINAL_TAG TreeNodeTag
  • Add GpuGetStructFieldMeta with effectiveOrdinal that reads the tag, falling back to the original ordinal when no tag is set
  • Extend GpuGetArrayStructFieldsMeta with effectiveOrdinal and effectiveNumFields for pruned array-of-struct access
  • Register GetStructField with GpuGetStructFieldMeta instead of an anonymous UnaryExprMeta in GpuOverrides
  • Fix Alias typeMeta to delegate to child expression, ensuring overrideDataType propagates correctly through aliases.

Checklists

  • This PR has added documentation for new or modified features or behaviors.
  • This PR has added new tests or modified existing tests to cover new code paths.
    (Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.)
  • Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description.

Made-with: Cursor

…tFields

Introduce PRUNED_ORDINAL_TAG (TreeNodeTag) and named Meta classes that
support runtime ordinal remapping for struct field extraction expressions.
This enables future schema projection optimizations (e.g., protobuf nested
pruning) to rewrite field ordinals at GPU conversion time without modifying
the generic runtime expression classes.

Changes:
- Add GpuStructFieldOrdinalTag with PRUNED_ORDINAL_TAG TreeNodeTag
- Add GpuGetStructFieldMeta with effectiveOrdinal that reads the tag,
  falling back to the original ordinal when no tag is set
- Extend GpuGetArrayStructFieldsMeta with effectiveOrdinal and
  effectiveNumFields for pruned array-of-struct access
- Register GetStructField with GpuGetStructFieldMeta instead of an
  anonymous UnaryExprMeta in GpuOverrides
- Fix Alias typeMeta to delegate to child expression, ensuring
  overrideDataType propagates correctly through aliases

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>
Made-with: Cursor
@thirtiseven thirtiseven self-assigned this Mar 31, 2026
@thirtiseven
Copy link
Copy Markdown
Collaborator Author

@greptile review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 1, 2026

Greptile Summary

This PR is Part 1 of the from_protobuf schema-pruning work. It introduces the PRUNED_ORDINAL_TAG TreeNodeTag and a pair of named meta classes — GpuGetStructFieldMeta and updated GpuGetArrayStructFieldsMeta — that read the tag at GPU conversion time and substitute a remapped ordinal (and derived numFields) in place of the originals. No tag set = identical behaviour to the previous anonymous UnaryExprMeta. A small companion fix delegates Alias.typeMeta to the child expression so that future overrideDataType calls on child metas propagate correctly through aliases.

Changes

  • GpuStructFieldOrdinalTag.PRUNED_ORDINAL_TAG — singleton TreeNodeTag[Int] for ordinal remapping
  • GpuGetStructFieldMeta / GpuGetStructFieldMeta.effectiveOrdinal — reads tag, falls back to expr.ordinal
  • GpuGetArrayStructFieldsMeta.convertToGpu / GpuGetArrayStructFieldsMeta.effectiveNumFields — reads tag, derives numFields from the (pruned) child array element schema
  • GpuOverrides: GetStructField wired to GpuGetStructFieldMeta; Alias typeMeta delegates to child
  • New StructFieldOrdinalTagSuite with 4 pure unit tests covering both helper methods

Minor observations (no blockers)

  • The Alias.typeMeta override silently drops any overrideType set directly on the Alias meta. No current call site sets it, but a brief comment would prevent future surprises.
  • effectiveNumFields relies on an implicit contract that the child's schema is already pruned when the tag is active; documenting this coupling would aid the Part 2 author.

Confidence Score: 5/5

  • Safe to merge; all remaining findings are documentation/clarity suggestions with no runtime impact.
  • The core logic is correct: no-tag path is a strict no-op relative to the previous anonymous meta, tag path correctly remaps ordinal and derives numFields from the pruned child schema. GPU doColumnar code is untouched. The two P2 comments flag a missing contract comment and a silent discard of an unused overrideType field — neither causes a bug today or in any current code path. Test coverage is appropriate for the added utility methods.
  • No files require special attention; all changes are in infrastructure/meta code that is not exercised at runtime until Part 2 sets the tag.

Important Files Changed

Filename Overview
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/complexTypeExtractors.scala Adds GpuStructFieldOrdinalTag, GpuGetStructFieldMeta, and updates GpuGetArrayStructFieldsMeta with effectiveOrdinal/effectiveNumFields support for future schema pruning remapping; logic is sound and GPU doColumnar paths are unaffected by numFields.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala Replaces anonymous UnaryExprMeta for GetStructField with the new GpuGetStructFieldMeta, and adds typeMeta delegation to child for Alias; both changes are minimal and correct.
sql-plugin/src/test/scala/com/nvidia/spark/rapids/StructFieldOrdinalTagSuite.scala Unit tests for effectiveOrdinal and effectiveNumFields helper methods; covers no-tag, tagged, and non-array-child scenarios without needing SparkSession.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[GetStructField / GetArrayStructFields\nSpark expression node] -->|set PRUNED_ORDINAL_TAG| B{PRUNED_ORDINAL_TAG set?}
    B -- No --> C[Use expr.ordinal\nUse expr.numFields]
    B -- Yes --> D[Use tagged ordinal\nDerive numFields from child schema]
    C --> E[GpuGetStructField / GpuGetArrayStructFields\nGPU expression]
    D --> E
    E --> F[doColumnar: extract field at ordinal\nfrom GPU column view]

    G[Alias meta] -->|typeMeta| H[Delegate to childExprs.head.typeMeta]
    H --> I[Child effective DataType\npropagates through Alias]
Loading

Reviews (2): Last reviewed commit: "Remove redundant tests and dead effectiv..." | Re-trigger Greptile

thirtiseven and others added 2 commits April 2, 2026 17:23
…st ordinal

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@thirtiseven thirtiseven marked this pull request as ready for review April 2, 2026 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants