Skip to content

test(coverage): strict type-parameter checking, catch 2 wrong test results#1048

Open
bvolpato wants to merge 1 commit intosubstrait-io:mainfrom
bvolpato:fix/strict-type-parameter-checker
Open

test(coverage): strict type-parameter checking, catch 2 wrong test results#1048
bvolpato wants to merge 1 commit intosubstrait-io:mainfrom
bvolpato:fix/strict-type-parameter-checker

Conversation

@bvolpato
Copy link
Copy Markdown
Member

@bvolpato bvolpato commented Apr 13, 2026

Summary

The coverage checker's FunctionRegistry.is_same_type only compares base type names — both sides are stripped of <...> before comparison, so decimal precision/scale, varchar length, and list/map element types have never actually been validated against the extension YAML return formulas.

That hole was already hiding two wrong test cases in-tree:

  • tests/cases/arithmetic_decimal/sum_decimal.test:7

    sum((2.5, 0, 5.0, -2.5, -7.5)::dec<2, 1>) = -2.5::dec?<38, 2>
    

    sum is declared DECIMAL?<38,S>, so with input scale 1 the result must be dec?<38, 1>, not dec?<38, 2>.

  • tests/cases/comparison/nullif.test:21

    nullif(null::dec?<38, 0>, null::dec?<38, 0>) = null::bool?
    

    nullif is any1, any1 -> any1?, so with both args dec<38, 0> the result must be dec?<38, 0>. Looks like a copy-paste from the bool cases above.

Relation to prior work

Both wrong result types have been in-tree since the original BFT port — d84ccd1 feat: port function testcases from bft (#738) (Nov 2024) introduced them. This PR is not reverting or weakening:

Confirmed by git log -p -S ... -- tests/cases/... on both lines: d84ccd1 introduces the bug, ba9b0ff (#989) preserves it in its nullability-enforcement rewrite, and this PR is the first change to correct it.

What the PR does

  1. Fixes both wrong test cases.
  2. Adds tests/coverage/type_checker.py — a symbolic unifier plus evaluator for the YAML return-formula mini-language (assignments, min/max, cond ? a : b ternary). It:
    • parses type strings like decimal<P1,S1>, list<any2>, STRUCT<...>, func<any1 -> boolean?> into tagged tuples;
    • unifies an impl-declared type against a concrete test type, binding variables (P1, S1, any1, any2);
    • evaluates multi-line return formulas (add/subtract/multiply/divide/modulus, min/max/sum, any1?, nested types, func bodies) with the bindings and compares structurally against the test's declared result type.
  3. Wires the new checker into FunctionRegistry.get_function. FunctionOverload/FunctionVariant now carry the raw impl args and the full return formula alongside the existing short-form fingerprint. The strict check runs after the legacy loose match, so the fast path is preserved and the check only fires when callers supply full parameterized types (new full_arg_types / full_return_type kwargs; TestCase.get_full_arg_types() added as the source).
  4. When a formula cannot be evaluated (unbound variable or unusual syntax), the strict check falls back to success. That keeps every currently-passing test green and leaves room to tighten further in a follow-up.
  5. tests/coverage/test_type_checker.py covers parsing, unification, formula evaluation for add/divide, the two specific bugs above, and the tolerant behavior for tests that intentionally omit optional decimal parameters (e.g. power(dec, dec<38,0>)).

I verified the new check actually catches the bugs by reverting the two test-file fixes and running pytest tests/test_extensions.py::test_substrait_extension_coverage:

ERROR: Strict parameter check failed for sum(dec<2,1>) -> dec?<38,2>: return: expected decimal<38, 1> but test declares decimal<38, 2>
ERROR: Strict parameter check failed for nullif(dec?<38,0>, dec?<38,0>) -> bool?: return: expected decimal<38, 0> but test declares boolean

Test plan

  • pytest tests/coverage/test_type_checker.py — 33 passed
  • pytest tests/test_extensions.py::test_substrait_extension_coverage — passes with the fixes, fails clearly without them
  • pytest tests/ --ignore=tests/test_proto_example_validator.py — 152 passed (ignored module needs buf generate which is unrelated)
  • black --check on touched files
  • flake8 on touched files

This change is Reviewable

…sults

The coverage checker's `is_same_type` compared only the base type name
and stripped all parameters, so decimal precision/scale, varchar length,
and list/map element types were never actually validated against the
extension YAML return formulas. Two wrong test cases had been sitting
in-tree because of this:

- `sum((2.5, 0, 5.0, -2.5, -7.5)::dec<2, 1>) = -2.5::dec<38, 2>` in
  `tests/cases/arithmetic_decimal/sum_decimal.test`. `sum` returns
  `DECIMAL?<38,S>`, so with input scale 1 the output must be
  `dec?<38, 1>`, not `dec?<38, 2>`.
- `nullif(null::dec?<38, 0>, null::dec?<38, 0>) = null::bool?` in
  `tests/cases/comparison/nullif.test`. `nullif` is `any1, any1 -> any1?`,
  so with both args `dec<38, 0>` the result must be `dec?<38, 0>`, not
  `bool?`. Looks like a copy-paste from the bool cases above.

Both wrong results date back to the original BFT port (substrait-io#738). They are
not undoing prior work: substrait-io#913 only touched the `basic` i8/i16 block at
the top of `nullif.test`, and substrait-io#989 rewrote both lines only to add `?`
nullability markers, preserving the underlying wrong `38,2` and wrong
`bool` base type. This change keeps every `?` marker from substrait-io#989 and
only fixes the parameter substrait-io#989 wasn't looking at.

To prevent regressions, this adds `tests/coverage/type_checker.py` — a
symbolic unifier plus evaluator for the YAML return-formula mini-language
(assignments, `min`/`max`, `cond ? a : b` ternary). The new module:

- parses type strings like `decimal<P1,S1>`, `list<any2>`, `STRUCT<...>`,
  `func<any1 -> boolean?>` into tagged tuples;
- unifies an impl-declared type against a concrete test type, binding
  variables like `P1`, `S1`, and polymorphic `any1`/`any2`;
- evaluates multi-line return formulas (add/sub/mul/div/mod, min/max,
  sum, `any1?`, etc.) with the bindings and compares structurally to
  the test's declared result type.

`FunctionOverload`/`FunctionVariant` now carry the raw YAML arg types
and return formula alongside the existing short-form fingerprint.
`FunctionRegistry.get_function` runs the strict check after the legacy
loose match, so the wider test suite's base-type fallback still applies
when the caller hasn't supplied full parameterized types. When the
formula cannot be evaluated (unbound variable, unusual syntax), the
strict check falls back to success, preserving compatibility with the
current extensions and leaving room to tighten further.

`test_type_checker.py` covers parsing, unification, formula evaluation
for add/divide, the two specific bugs above, and the tolerant behavior
for tests that omit optional decimal parameters (e.g. `power(dec, dec<38,0>)`).
@bvolpato-dd bvolpato-dd force-pushed the fix/strict-type-parameter-checker branch from 086041e to 85d16cd Compare April 14, 2026 04:35
@bvolpato bvolpato marked this pull request as ready for review April 14, 2026 15:04
return _SHORT_TO_LONG.get(base, base)


def parse_type(s):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try and find time to review this more thoroughly, but on first glance, it feels unnecessary to hand-write a parser for a type language with a formal grammar and generated parsers.

There is something brittle about the current way the code is written before this PR which I can't quite recall, but I remember in the past trying to rewrite this code to better take advantage of the antlr grammar and stopping because I ran into some hurdles. Might be worth reconsidering a larger rewrite if it means we can avoid having to maintain this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants