test(coverage): strict type-parameter checking, catch 2 wrong test results#1048
Open
bvolpato wants to merge 1 commit intosubstrait-io:mainfrom
Open
test(coverage): strict type-parameter checking, catch 2 wrong test results#1048bvolpato wants to merge 1 commit intosubstrait-io:mainfrom
bvolpato wants to merge 1 commit intosubstrait-io:mainfrom
Conversation
…sults The coverage checker's `is_same_type` compared only the base type name and stripped all parameters, so decimal precision/scale, varchar length, and list/map element types were never actually validated against the extension YAML return formulas. Two wrong test cases had been sitting in-tree because of this: - `sum((2.5, 0, 5.0, -2.5, -7.5)::dec<2, 1>) = -2.5::dec<38, 2>` in `tests/cases/arithmetic_decimal/sum_decimal.test`. `sum` returns `DECIMAL?<38,S>`, so with input scale 1 the output must be `dec?<38, 1>`, not `dec?<38, 2>`. - `nullif(null::dec?<38, 0>, null::dec?<38, 0>) = null::bool?` in `tests/cases/comparison/nullif.test`. `nullif` is `any1, any1 -> any1?`, so with both args `dec<38, 0>` the result must be `dec?<38, 0>`, not `bool?`. Looks like a copy-paste from the bool cases above. Both wrong results date back to the original BFT port (substrait-io#738). They are not undoing prior work: substrait-io#913 only touched the `basic` i8/i16 block at the top of `nullif.test`, and substrait-io#989 rewrote both lines only to add `?` nullability markers, preserving the underlying wrong `38,2` and wrong `bool` base type. This change keeps every `?` marker from substrait-io#989 and only fixes the parameter substrait-io#989 wasn't looking at. To prevent regressions, this adds `tests/coverage/type_checker.py` — a symbolic unifier plus evaluator for the YAML return-formula mini-language (assignments, `min`/`max`, `cond ? a : b` ternary). The new module: - parses type strings like `decimal<P1,S1>`, `list<any2>`, `STRUCT<...>`, `func<any1 -> boolean?>` into tagged tuples; - unifies an impl-declared type against a concrete test type, binding variables like `P1`, `S1`, and polymorphic `any1`/`any2`; - evaluates multi-line return formulas (add/sub/mul/div/mod, min/max, sum, `any1?`, etc.) with the bindings and compares structurally to the test's declared result type. `FunctionOverload`/`FunctionVariant` now carry the raw YAML arg types and return formula alongside the existing short-form fingerprint. `FunctionRegistry.get_function` runs the strict check after the legacy loose match, so the wider test suite's base-type fallback still applies when the caller hasn't supplied full parameterized types. When the formula cannot be evaluated (unbound variable, unusual syntax), the strict check falls back to success, preserving compatibility with the current extensions and leaving room to tighten further. `test_type_checker.py` covers parsing, unification, formula evaluation for add/divide, the two specific bugs above, and the tolerant behavior for tests that omit optional decimal parameters (e.g. `power(dec, dec<38,0>)`).
086041e to
85d16cd
Compare
benbellick
reviewed
Apr 15, 2026
| return _SHORT_TO_LONG.get(base, base) | ||
|
|
||
|
|
||
| def parse_type(s): |
Member
There was a problem hiding this comment.
I will try and find time to review this more thoroughly, but on first glance, it feels unnecessary to hand-write a parser for a type language with a formal grammar and generated parsers.
There is something brittle about the current way the code is written before this PR which I can't quite recall, but I remember in the past trying to rewrite this code to better take advantage of the antlr grammar and stopping because I ran into some hurdles. Might be worth reconsidering a larger rewrite if it means we can avoid having to maintain this.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The coverage checker's
FunctionRegistry.is_same_typeonly compares base type names — both sides are stripped of<...>before comparison, so decimal precision/scale, varchar length, and list/map element types have never actually been validated against the extension YAML return formulas.That hole was already hiding two wrong test cases in-tree:
tests/cases/arithmetic_decimal/sum_decimal.test:7sumis declaredDECIMAL?<38,S>, so with input scale 1 the result must bedec?<38, 1>, notdec?<38, 2>.tests/cases/comparison/nullif.test:21nullifisany1, any1 -> any1?, so with both argsdec<38, 0>the result must bedec?<38, 0>. Looks like a copy-paste from the bool cases above.Relation to prior work
Both wrong result types have been in-tree since the original BFT port —
d84ccd1 feat: port function testcases from bft (#738)(Nov 2024) introduced them. This PR is not reverting or weakening:fix(extensions): nullif output should always be nullable #913 —
fix(extensions): nullif output should always be nullable— that PR fixed the YAML definition ofnullifand only touched thebasici8/i16 examples at the top ofnullif.test. It never touched line 21 (thenull::dec?<38,0>vsnull::bool?line). This PR preserves the nullable extension fix and only corrects the return type.fix: enforce nullable types for null literals in test cases #989 —
fix: enforce nullable types for null literals in test cases— that PR added?nullability markers to every null-literal test across the wholetests/cases/tree. On the two lines I'm changing, fix: enforce nullable types for null literals in test cases #989 kept the pre-existing wrong non-nullability information intact (dec<38,2>→dec?<38,2>for sum;::bool→::bool?for nullif) because its scope was the nullability dimension, not precision/scale or return base type. This PR preserves every?marker fix: enforce nullable types for null literals in test cases #989 introduced and only fixes the parameter that fix: enforce nullable types for null literals in test cases #989 wasn't looking at:-2.5::dec?<38, 2>→-2.5::dec?<38, 1>(scale from 2 → 1;?preserved)null::bool?→null::dec?<38, 0>(base type bool → decimal;?preserved)Confirmed by
git log -p -S ... -- tests/cases/...on both lines: d84ccd1 introduces the bug, ba9b0ff (#989) preserves it in its nullability-enforcement rewrite, and this PR is the first change to correct it.What the PR does
tests/coverage/type_checker.py— a symbolic unifier plus evaluator for the YAML return-formula mini-language (assignments,min/max,cond ? a : bternary). It:decimal<P1,S1>,list<any2>,STRUCT<...>,func<any1 -> boolean?>into tagged tuples;P1,S1,any1,any2);any1?, nested types, func bodies) with the bindings and compares structurally against the test's declared result type.FunctionRegistry.get_function.FunctionOverload/FunctionVariantnow carry the raw impl args and the full return formula alongside the existing short-form fingerprint. The strict check runs after the legacy loose match, so the fast path is preserved and the check only fires when callers supply full parameterized types (newfull_arg_types/full_return_typekwargs;TestCase.get_full_arg_types()added as the source).tests/coverage/test_type_checker.pycovers parsing, unification, formula evaluation for add/divide, the two specific bugs above, and the tolerant behavior for tests that intentionally omit optional decimal parameters (e.g.power(dec, dec<38,0>)).I verified the new check actually catches the bugs by reverting the two test-file fixes and running
pytest tests/test_extensions.py::test_substrait_extension_coverage:Test plan
pytest tests/coverage/test_type_checker.py— 33 passedpytest tests/test_extensions.py::test_substrait_extension_coverage— passes with the fixes, fails clearly without thempytest tests/ --ignore=tests/test_proto_example_validator.py— 152 passed (ignored module needsbuf generatewhich is unrelated)black --checkon touched filesflake8on touched filesThis change is