feat(parser): update to Substrait v0.87.0 with enum arg support#222
feat(parser): update to Substrait v0.87.0 with enum arg support#222benbellick merged 25 commits intomainfrom
Conversation
- Regenerate testcase parser grammar from v0.87.0 (adds enumArg rule: IDENTIFIER::enum syntax for function enum arguments) - Implement VisitEnumArg in visitor, returning types.Enum as FuncArg - Change CaseLiteral.Value from expr.Literal to types.FuncArg to accommodate enum values alongside regular literals - Fix getAggregateFuncTableSchema to index by ColumnIndex rather than arg position, preventing nil dereference when enum args precede column args (e.g. std_dev(SAMPLE::enum, col0::fp32)) - Skip std_dev/variance tests pending upstream fix for single-column compact aggregate format - Bump github.com/substrait-io/substrait dep to v0.87.0 Partially closes #220 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… field Enum args (YEAR::enum) are not literals — they don't belong in CaseLiteral.Value (expr.Literal). Instead, VisitEnumArg stores only ValueText and Type=CommonEnumType, and a new FuncArg() method on CaseLiteral converts to types.FuncArg at call sites, returning types.Enum(ValueText) for enum args and Value for everything else. This reverts the CaseLiteral.Value type widening (types.FuncArg → back to expr.Literal) from the previous commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Enum args (YEAR::enum) are a valid FunctionArgument in the substrait proto — siblings of expression literals, not subtypes. types.Enum can't implement expr.Literal because there's no Expression.Literal proto variant for enums; they appear as FunctionArgument.enum (a string field). types.FuncArg is the right interface: it covers all three FunctionArgument kinds (value/literal, type, enum). This reverts the FuncArg() helper method added in the previous commit in favour of the simpler direct field access. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #222 +/- ##
==========================================
+ Coverage 68.56% 68.71% +0.15%
==========================================
Files 47 47
Lines 10847 10856 +9
==========================================
+ Hits 7437 7460 +23
+ Misses 3055 3042 -13
+ Partials 355 354 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Scalar decimal args are already normalized by updateLiteralType in VisitArguments, so applying WithType directly in VisitDecimalArg is redundant. Keep this PR focused on enum arg support. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…onInvocation normalizeArgsForVariant and enumSubstitutedArgTypes existed to handle old test files that represented enum options as string literals (e.g. 'YEAR'::str). v0.87.0 test files use the proper YEAR::enum syntax, which VisitEnumArg now handles natively, so these workarounds are no longer needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…hema TODO(#223): these test files are skipped until the compact table format and enum arg schema issues are resolved. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removes the hasNilType guard in tests since enum arg types are now correctly reflected in function invocation arg types. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… helper, assert enum type in test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…itations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The single-column compact table format used in these test files was corrected in substrait-io/substrait#1043, released in v0.88.0. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…arounds Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…tring() Also fixes CaseLiteral.String() to render enum args as "VALUE::enum" rather than "VALUE::" (CommonEnumType.Name is empty). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…g, add coverage - Fix CaseLiteral.String() and AsAggregateArgumentString() to render enum args as "VALUE::enum" rather than "VALUE::" since CommonEnumType.Name is empty - Add enum case to TestParseTestWithVariousTypes and TestParseAggregateTestWithVariousTypes - Strengthen result literal assertion in TestParseTestWithVariousTypes Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
bvolpato
left a comment
There was a problem hiding this comment.
LGTM -- nice writeup on the CaseLiteral.Value widening from expr.Literal to types.FuncArg, the reasoning around the proto oneof is spot on. Good that extract.test gets unblocked too.
One thing worth fixing before merge: in testcases/parser/parse_test.go:436-438, the new if lit, ok := tc.AggregateArgs[0].Argument.Value.(expr.Literal); ok { assert.Equal(t, listType, lit.GetType()) } silently no-ops if the cast fails. Previously that was an unconditional assert.Equal(t, listType, tc.AggregateArgs[0].Argument.Value.GetType()), so a regression would now pass the test instead of failing it. The value here is always a float32 list literal, so use require.True(t, ok, ...) like you already do at parse_test.go:218-219.
…entString The non-list expr.Literal branch was dead code since AsAggregateArgumentString is only called for singleArgAggregateFuncCall column data, which is always a ListLiteral. Also strengthen aggregate arg type assertion in test per review feedback. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Upgrades the substrait dependency from v0.85.0 to v0.87.0 and adds parser support for the
enumArgsyntax introduced in that version (IDENTIFIER::enum, e.g.SAMPLE::enuminstd_dev).The interesting change is widening
CaseLiteral.Valuefromexpr.Literaltotypes.FuncArg. This is necessary because enum args don't map to anExpression.Literalin the substrait proto.FunctionArgumentis a oneof of{string enum, Type type, Expression value}, making enum a sibling of a literal expression rather than a subtype. Sotypes.Enumcannot implementexpr.Literal, and the field needs to hold the broadertypes.FuncArginterface instead.std_devandvariancetest cases remain skipped because the v0.87.0 test files use an invalid single-column compact table format fixed in substrait v0.88.0 (substrait-io/substrait#1043).