Skip to content

feat(parser): update to Substrait v0.87.0 with enum arg support#222

Merged
benbellick merged 25 commits intomainfrom
feat/substrait-v0.87.0-enum-arg-support
Apr 14, 2026
Merged

feat(parser): update to Substrait v0.87.0 with enum arg support#222
benbellick merged 25 commits intomainfrom
feat/substrait-v0.87.0-enum-arg-support

Conversation

@benbellick
Copy link
Copy Markdown
Member

@benbellick benbellick commented Apr 9, 2026

Upgrades the substrait dependency from v0.85.0 to v0.87.0 and adds parser support for the enumArg syntax introduced in that version (IDENTIFIER::enum, e.g. SAMPLE::enum in std_dev).

The interesting change is widening CaseLiteral.Value from expr.Literal to types.FuncArg. This is necessary because enum args don't map to an Expression.Literal in the substrait proto. FunctionArgument is a oneof of {string enum, Type type, Expression value}, making enum a sibling of a literal expression rather than a subtype. So types.Enum cannot implement expr.Literal, and the field needs to hold the broader types.FuncArg interface instead.

std_dev and variance test cases remain skipped because the v0.87.0 test files use an invalid single-column compact table format fixed in substrait v0.88.0 (substrait-io/substrait#1043).

benbellick and others added 4 commits April 9, 2026 14:49
- Regenerate testcase parser grammar from v0.87.0 (adds enumArg rule:
  IDENTIFIER::enum syntax for function enum arguments)
- Implement VisitEnumArg in visitor, returning types.Enum as FuncArg
- Change CaseLiteral.Value from expr.Literal to types.FuncArg to
  accommodate enum values alongside regular literals
- Fix getAggregateFuncTableSchema to index by ColumnIndex rather than
  arg position, preventing nil dereference when enum args precede
  column args (e.g. std_dev(SAMPLE::enum, col0::fp32))
- Skip std_dev/variance tests pending upstream fix for single-column
  compact aggregate format
- Bump github.com/substrait-io/substrait dep to v0.87.0

Partially closes #220

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… field

Enum args (YEAR::enum) are not literals — they don't belong in
CaseLiteral.Value (expr.Literal). Instead, VisitEnumArg stores only
ValueText and Type=CommonEnumType, and a new FuncArg() method on
CaseLiteral converts to types.FuncArg at call sites, returning
types.Enum(ValueText) for enum args and Value for everything else.

This reverts the CaseLiteral.Value type widening (types.FuncArg → back
to expr.Literal) from the previous commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Enum args (YEAR::enum) are a valid FunctionArgument in the substrait
proto — siblings of expression literals, not subtypes. types.Enum
can't implement expr.Literal because there's no Expression.Literal
proto variant for enums; they appear as FunctionArgument.enum (a
string field). types.FuncArg is the right interface: it covers all
three FunctionArgument kinds (value/literal, type, enum).

This reverts the FuncArg() helper method added in the previous commit
in favour of the simpler direct field access.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 91.66667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 68.71%. Comparing base (2242c28) to head (858423d).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
testcases/parser/nodes.go 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #222      +/-   ##
==========================================
+ Coverage   68.56%   68.71%   +0.15%     
==========================================
  Files          47       47              
  Lines       10847    10856       +9     
==========================================
+ Hits         7437     7460      +23     
+ Misses       3055     3042      -13     
+ Partials      355      354       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

benbellick and others added 17 commits April 9, 2026 15:50
Scalar decimal args are already normalized by updateLiteralType in
VisitArguments, so applying WithType directly in VisitDecimalArg is
redundant. Keep this PR focused on enum arg support.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…onInvocation

normalizeArgsForVariant and enumSubstitutedArgTypes existed to handle old
test files that represented enum options as string literals (e.g. 'YEAR'::str).
v0.87.0 test files use the proper YEAR::enum syntax, which VisitEnumArg now
handles natively, so these workarounds are no longer needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…hema

TODO(#223): these test files are skipped until the compact table format
and enum arg schema issues are resolved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removes the hasNilType guard in tests since enum arg types are now
correctly reflected in function invocation arg types.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… helper, assert enum type in test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…itations

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The single-column compact table format used in these test files was
corrected in substrait-io/substrait#1043, released in v0.88.0.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…arounds

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…tring()

Also fixes CaseLiteral.String() to render enum args as "VALUE::enum"
rather than "VALUE::" (CommonEnumType.Name is empty).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…g, add coverage

- Fix CaseLiteral.String() and AsAggregateArgumentString() to render
  enum args as "VALUE::enum" rather than "VALUE::" since CommonEnumType.Name
  is empty
- Add enum case to TestParseTestWithVariousTypes and
  TestParseAggregateTestWithVariousTypes
- Strengthen result literal assertion in TestParseTestWithVariousTypes

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@bvolpato bvolpato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- nice writeup on the CaseLiteral.Value widening from expr.Literal to types.FuncArg, the reasoning around the proto oneof is spot on. Good that extract.test gets unblocked too.

One thing worth fixing before merge: in testcases/parser/parse_test.go:436-438, the new if lit, ok := tc.AggregateArgs[0].Argument.Value.(expr.Literal); ok { assert.Equal(t, listType, lit.GetType()) } silently no-ops if the cast fails. Previously that was an unconditional assert.Equal(t, listType, tc.AggregateArgs[0].Argument.Value.GetType()), so a regression would now pass the test instead of failing it. The value here is always a float32 list literal, so use require.True(t, ok, ...) like you already do at parse_test.go:218-219.

Comment thread types/types.go
@benbellick benbellick marked this pull request as ready for review April 14, 2026 01:47
benbellick and others added 4 commits April 14, 2026 09:33
…entString

The non-list expr.Literal branch was dead code since AsAggregateArgumentString
is only called for singleArgAggregateFuncCall column data, which is always
a ListLiteral. Also strengthen aggregate arg type assertion in test per
review feedback.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@benbellick benbellick merged commit 0de3c9f into main Apr 14, 2026
12 checks passed
@benbellick benbellick deleted the feat/substrait-v0.87.0-enum-arg-support branch April 14, 2026 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants