test(coverage): strict type-parameter checking, catch 2 wrong test results by bvolpato · Pull Request #1048 · substrait-io/substrait

bvolpato · 2026-04-13T19:46:31Z

Summary

The coverage checker's FunctionRegistry.is_same_type only compares base type names — both sides are stripped of <...> before comparison, so decimal precision/scale, varchar length, and list/map element types have never actually been validated against the extension YAML return formulas.

That hole was already hiding two wrong test cases in-tree:

tests/cases/arithmetic_decimal/sum_decimal.test:7
```
sum((2.5, 0, 5.0, -2.5, -7.5)::dec<2, 1>) = -2.5::dec?<38, 2>
```
sum is declared DECIMAL?<38,S>, so with input scale 1 the result must be dec?<38, 1>, not dec?<38, 2>.
tests/cases/comparison/nullif.test:21
```
nullif(null::dec?<38, 0>, null::dec?<38, 0>) = null::bool?
```
nullif is any1, any1 -> any1?, so with both args dec<38, 0> the result must be dec?<38, 0>. Looks like a copy-paste from the bool cases above.

Relation to prior work

Both wrong result types have been in-tree since the original BFT port — d84ccd1 feat: port function testcases from bft (#738) (Nov 2024) introduced them. This PR is not reverting or weakening:

fix(extensions): nullif output should always be nullable #913 — fix(extensions): nullif output should always be nullable — that PR fixed the YAML definition of nullif and only touched the basic i8/i16 examples at the top of nullif.test. It never touched line 21 (the null::dec?<38,0> vs null::bool? line). This PR preserves the nullable extension fix and only corrects the return type.
fix: enforce nullable types for null literals in test cases #989 — fix: enforce nullable types for null literals in test cases — that PR added ? nullability markers to every null-literal test across the whole tests/cases/ tree. On the two lines I'm changing, fix: enforce nullable types for null literals in test cases #989 kept the pre-existing wrong non-nullability information intact (dec<38,2> → dec?<38,2> for sum; ::bool → ::bool? for nullif) because its scope was the nullability dimension, not precision/scale or return base type. This PR preserves every ? marker fix: enforce nullable types for null literals in test cases #989 introduced and only fixes the parameter that fix: enforce nullable types for null literals in test cases #989 wasn't looking at:
- sum: -2.5::dec?<38, 2> → -2.5::dec?<38, 1> (scale from 2 → 1; ? preserved)
- nullif: null::bool? → null::dec?<38, 0> (base type bool → decimal; ? preserved)

Confirmed by git log -p -S ... -- tests/cases/... on both lines: d84ccd1 introduces the bug, ba9b0ff (#989) preserves it in its nullability-enforcement rewrite, and this PR is the first change to correct it.

What the PR does

Fixes both wrong test cases.
Adds tests/coverage/type_checker.py — a symbolic unifier plus evaluator for the YAML return-formula mini-language (assignments, min/max, cond ? a : b ternary). It:
- parses type strings like decimal<P1,S1>, list<any2>, STRUCT<...>, func<any1 -> boolean?> into tagged tuples;
- unifies an impl-declared type against a concrete test type, binding variables (P1, S1, any1, any2);
- evaluates multi-line return formulas (add/subtract/multiply/divide/modulus, min/max/sum, any1?, nested types, func bodies) with the bindings and compares structurally against the test's declared result type.
Wires the new checker into FunctionRegistry.get_function. FunctionOverload/FunctionVariant now carry the raw impl args and the full return formula alongside the existing short-form fingerprint. The strict check runs after the legacy loose match, so the fast path is preserved and the check only fires when callers supply full parameterized types (new full_arg_types / full_return_type kwargs; TestCase.get_full_arg_types() added as the source).
When a formula cannot be evaluated (unbound variable or unusual syntax), the strict check falls back to success. That keeps every currently-passing test green and leaves room to tighten further in a follow-up.
tests/coverage/test_type_checker.py covers parsing, unification, formula evaluation for add/divide, the two specific bugs above, and the tolerant behavior for tests that intentionally omit optional decimal parameters (e.g. power(dec, dec<38,0>)).

I verified the new check actually catches the bugs by reverting the two test-file fixes and running pytest tests/test_extensions.py::test_substrait_extension_coverage:

ERROR: Strict parameter check failed for sum(dec<2,1>) -> dec?<38,2>: return: expected decimal<38, 1> but test declares decimal<38, 2>
ERROR: Strict parameter check failed for nullif(dec?<38,0>, dec?<38,0>) -> bool?: return: expected decimal<38, 0> but test declares boolean

Test plan

pytest tests/coverage/test_type_checker.py — 33 passed
pytest tests/test_extensions.py::test_substrait_extension_coverage — passes with the fixes, fails clearly without them
pytest tests/ --ignore=tests/test_proto_example_validator.py — 152 passed (ignored module needs buf generate which is unrelated)
black --check on touched files
flake8 on touched files

This change is

…sults The coverage checker's `is_same_type` compared only the base type name and stripped all parameters, so decimal precision/scale, varchar length, and list/map element types were never actually validated against the extension YAML return formulas. Two wrong test cases had been sitting in-tree because of this: - `sum((2.5, 0, 5.0, -2.5, -7.5)::dec<2, 1>) = -2.5::dec<38, 2>` in `tests/cases/arithmetic_decimal/sum_decimal.test`. `sum` returns `DECIMAL?<38,S>`, so with input scale 1 the output must be `dec?<38, 1>`, not `dec?<38, 2>`. - `nullif(null::dec?<38, 0>, null::dec?<38, 0>) = null::bool?` in `tests/cases/comparison/nullif.test`. `nullif` is `any1, any1 -> any1?`, so with both args `dec<38, 0>` the result must be `dec?<38, 0>`, not `bool?`. Looks like a copy-paste from the bool cases above. Both wrong results date back to the original BFT port (substrait-io#738). They are not undoing prior work: substrait-io#913 only touched the `basic` i8/i16 block at the top of `nullif.test`, and substrait-io#989 rewrote both lines only to add `?` nullability markers, preserving the underlying wrong `38,2` and wrong `bool` base type. This change keeps every `?` marker from substrait-io#989 and only fixes the parameter substrait-io#989 wasn't looking at. To prevent regressions, this adds `tests/coverage/type_checker.py` — a symbolic unifier plus evaluator for the YAML return-formula mini-language (assignments, `min`/`max`, `cond ? a : b` ternary). The new module: - parses type strings like `decimal<P1,S1>`, `list<any2>`, `STRUCT<...>`, `func<any1 -> boolean?>` into tagged tuples; - unifies an impl-declared type against a concrete test type, binding variables like `P1`, `S1`, and polymorphic `any1`/`any2`; - evaluates multi-line return formulas (add/sub/mul/div/mod, min/max, sum, `any1?`, etc.) with the bindings and compares structurally to the test's declared result type. `FunctionOverload`/`FunctionVariant` now carry the raw YAML arg types and return formula alongside the existing short-form fingerprint. `FunctionRegistry.get_function` runs the strict check after the legacy loose match, so the wider test suite's base-type fallback still applies when the caller hasn't supplied full parameterized types. When the formula cannot be evaluated (unbound variable, unusual syntax), the strict check falls back to success, preserving compatibility with the current extensions and leaving room to tighten further. `test_type_checker.py` covers parsing, unification, formula evaluation for add/divide, the two specific bugs above, and the tolerant behavior for tests that omit optional decimal parameters (e.g. `power(dec, dec<38,0>)`).

benbellick · 2026-04-15T14:31:16Z

+    return _SHORT_TO_LONG.get(base, base)
+
+
+def parse_type(s):


I will try and find time to review this more thoroughly, but on first glance, it feels unnecessary to hand-write a parser for a type language with a formal grammar and generated parsers.

There is something brittle about the current way the code is written before this PR which I can't quite recall, but I remember in the past trying to rewrite this code to better take advantage of the antlr grammar and stopping because I ran into some hurdles. Might be worth reconsidering a larger rewrite if it means we can avoid having to maintain this.

bvolpato-dd force-pushed the fix/strict-type-parameter-checker branch from 086041e to 85d16cd Compare April 14, 2026 04:35

bvolpato marked this pull request as ready for review April 14, 2026 15:04

bvolpato requested review from EpsilonPrime, benbellick, cpcloud, jacques-n, nielspardon, vbarua, westonpace and yongchul as code owners April 14, 2026 15:04

benbellick reviewed Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(coverage): strict type-parameter checking, catch 2 wrong test results#1048

test(coverage): strict type-parameter checking, catch 2 wrong test results#1048
bvolpato wants to merge 1 commit intosubstrait-io:mainfrom
bvolpato:fix/strict-type-parameter-checker

bvolpato commented Apr 13, 2026 •

edited

Loading

Uh oh!

benbellick Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bvolpato commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Relation to prior work

What the PR does

Test plan

Uh oh!

benbellick Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bvolpato commented Apr 13, 2026 •

edited

Loading