From 5b06392c6b4ff17538b453184a99b72712f144e5 Mon Sep 17 00:00:00 2001 From: Alasdair Allan Date: Fri, 12 Jun 2026 12:06:19 +0100 Subject: [PATCH 1/3] fix: E501 messages render the precondition in call-site terms MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The description gains an 'At this call site:' line with the callee's precondition rendered against the actual arguments (callee parameter slots substituted via slot_table's De Bruijn resolution, formatted with format_expr), and the Fix: text becomes concrete code — the guard with the rendered call and the exact requires(...) to add — instead of the generic two-option advice. Module-qualified callees and unmappable slots keep the generic wording. Tests pin the substituted rendering (string_length("") > 0), the De Bruijn shuffle (@Int.1 > @Int.0 at cmp(1, 2) renders 1 > 2), the unmappable-slot fallback, and the upgraded LSP instruction-contract assertions. 4,321 tests total. Folds into the v0.0.170 release (no version bump; the existing CHANGELOG [0.0.170] section gains the bullet; the tag moves to the merge commit on merge). Co-Authored-By: Claude --- CHANGELOG.md | 1 + README.md | 2 +- ROADMAP.md | 2 +- TESTING.md | 6 +-- tests/test_lsp.py | 7 ++- tests/test_obligations.py | 79 +++++++++++++++++++++++++++++ vera/verifier.py | 102 ++++++++++++++++++++++++++++++++++++-- 7 files changed, 190 insertions(+), 9 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index fcf71f7..dc57269 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,6 +12,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). - **[#727](https://github.com/aallan/vera/issues/727) — duplicate E501 diagnostics.** A violating call site could be recorded once per translation pass: the primary body translation, the `@Nat`-subtraction walker's let-RHS environment rebuild, and the walker's operand discharge each re-translate the same call (a `@Nat`-subtraction operand in a `let` RHS recorded **three** identical E501s). The SMT layer now dedups at recording time by (call node, precondition) identity, so every topology collapses to exactly one diagnostic and one `call_pre` obligation per violating site — while sites that only the walker ever translates — a violating call inside a `@Nat`-subtraction operand in statement position, checked via the walker's operand discharge — keep their single recording rather than being suppressed. (Bare statement-position calls with no enclosing subtraction remain unchecked either way; that pre-existing gap is tracked as [#730](https://github.com/aallan/vera/issues/730).) Translation behaviour is unchanged and the warm/cold differential oracle is untouched. - **[#728](https://github.com/aallan/vera/issues/728) — LSP diagnostics now carry the full instruction contract.** The language server's diagnostic mapping put only the description into the LSP message, so editor hovers said what broke but not how to fix it. The message now appends the rationale paragraph and the `Fix:` paragraph exactly as `--json` carries them; diagnostics without those fields map to the bare description, unchanged. +- **E501 messages now speak in call-site terms.** The precondition is rendered with the actual arguments substituted for the callee's parameter slots (`At this call site: string_length("") > 0`), and the `Fix:` text shows concrete code — the guard with the rendered call (`if string_length("") > 0 then { classify_sentiment("") } else { ... }`) and the exact `requires(...)` to add — instead of generic advice. Substitution honours De Bruijn most-recent-first resolution; module-qualified callees and unmappable slots keep the generic wording. ## [0.0.169] - 2026-06-11 diff --git a/README.md b/README.md index 2a7e33f..27e301e 100644 --- a/README.md +++ b/README.md @@ -210,7 +210,7 @@ cp /path/to/vera/SKILL.md ~/.claude/skills/vera-language/SKILL.md ## Project status -Vera is in **active development** at v0.0.170 — 810+ commits, 171 releases, 4,318 tests, 96% code coverage, 89 conformance programs, 35 examples, and a 13-chapter specification. See **[HISTORY.md](HISTORY.md)** for how the compiler was built. +Vera is in **active development** at v0.0.170 — 810+ commits, 171 releases, 4,321 tests, 96% code coverage, 89 conformance programs, 35 examples, and a 13-chapter specification. See **[HISTORY.md](HISTORY.md)** for how the compiler was built. The reference compiler — parser, AST, type checker, contract verifier (Z3), WASM code generator, module system, browser runtime, and runtime contract insertion — is working. The language specification is in draft across [13 chapters](spec/). diff --git a/ROADMAP.md b/ROADMAP.md index 8376775..cf4f0f6 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -4,7 +4,7 @@ Where the project is going. See [HISTORY.md](HISTORY.md) for what's been built ## Where we are -4,318 tests, 89 conformance programs, 35 examples, 13 spec chapters. +4,321 tests, 89 conformance programs, 35 examples, 13 spec chapters. ## What's next diff --git a/TESTING.md b/TESTING.md index 9868445..c699365 100644 --- a/TESTING.md +++ b/TESTING.md @@ -6,7 +6,7 @@ This is the single source of truth for Vera's testing infrastructure, coverage d | Metric | Value | |--------|-------| -| **Tests** | 4,318 across 34 files (~55,721 lines of test code; 4,287 passed + 16 stress, 15 skipped) | +| **Tests** | 4,321 across 34 files (~55,830 lines of test code; 4,290 passed + 16 stress, 15 skipped) | | **Compiler code coverage** | 96% of 15,149 statements (CI minimum: 80%) | | **Conformance programs** | 89 programs across 9 spec chapters, validating every language feature | | **Example programs** | 35, all validated through `vera check` + `vera verify` | @@ -58,7 +58,7 @@ python scripts/fix_allowlists.py --fix # auto-fix stale allowlists | `test_parser.py` | 129 | 968 | Grammar rules, operator precedence, parse errors | | `test_ast.py` | 128 | 1,130 | AST transformation, node structure, serialisation, string escape sequences, ability declarations | | `test_checker.py` | 521 | 5,939 | Type synthesis, slot resolution, effects, effect subtyping, contracts, exhaustiveness, cross-module typing, visibility, error codes, string built-ins, generic rejection, IO operation types, Markdown types, Regex types, abilities, Map collection, Set collection, Decimal type, Json type, Html type, Http effect, Inference effect, removed legacy name regression | -| `test_obligations.py` | 263 | 772 | Reified proof obligations + warm `VerificationSession` (#222 Phase A): full-corpus differential oracle (warm session == cold `verify()` on diagnostics, summary, and obligation stream, plus warm-twice determinism, across all 35 examples and every verify/run-level conformance program), summary↔obligation tier-bookkeeping consistency, per-kind unit tests (requires / ensures / decreases / nat_sub / call_pre statuses, counterexamples, error codes), content-key stability + same-text-two-sites span disambiguation, session solver reuse, type-error short-circuit, ADT-registry resync between programs; plus the Phase B incremental suite — identical-source full replay, callee-body-edit replays callers while callee-contract-edit invalidates them, span-shift and ADT-edit conservative invalidation, cross-program isolation, timeout-status never cached (monkeypatched solver), FIFO eviction bound; plus the #727 dedup pin — a violating call in a let RHS records exactly one E501 diagnostic and one call_pre obligation | +| `test_obligations.py` | 266 | 851 | Reified proof obligations + warm `VerificationSession` (#222 Phase A): full-corpus differential oracle (warm session == cold `verify()` on diagnostics, summary, and obligation stream, plus warm-twice determinism, across all 35 examples and every verify/run-level conformance program), summary↔obligation tier-bookkeeping consistency, per-kind unit tests (requires / ensures / decreases / nat_sub / call_pre statuses, counterexamples, error codes), content-key stability + same-text-two-sites span disambiguation, session solver reuse, type-error short-circuit, ADT-registry resync between programs; plus the Phase B incremental suite — identical-source full replay, callee-body-edit replays callers while callee-contract-edit invalidates them, span-shift and ADT-edit conservative invalidation, cross-program isolation, timeout-status never cached (monkeypatched solver), FIFO eviction bound; plus the #727 dedup pin — a violating call in a let RHS records exactly one E501 diagnostic and one call_pre obligation | | `test_verifier.py` | 146 | 2,128 | Z3 verification, counterexamples, tier classification, call-site preconditions, branch-aware preconditions, pipe operator, cross-module contracts, match/ADT verification, decreases verification, mutual recursion, refined Bool/String/Float64 param sorts, **@Nat subtraction underflow obligation** (#520 — Path-A obligation discharge via requires/path-conditions/path-aware Z3 refutation, pure-literal exclusion, Int-Int and Nat-Int → Int exemptions) | | `test_codegen.py` | 1,134 | 19,570 | WASM compilation, arithmetic, Float64, Byte, arrays (incl. compound element types), ADTs, match (incl. nested patterns), generics, State\, Exn\ handlers, control flow, strings, string escape sequences, IO (read\_line, read\_file, write\_file, args, exit, get\_env, sleep, time, stderr), bounds checking, quantifiers, assert/assume, refinement type aliases, pipe operator, string built-ins, built-in shadowing, parse\_nat Result, GC, Markdown host bindings, Regex host bindings, Map collection, Set collection, Decimal type, Json type, Html type, Http effect, Inference effect, Random effect, example round-trips, GC shadow stack overflow, **WASM tail-call optimization** (#517 — `return_call` emission for tail-position calls, 50K- and 1M-iteration stress, structural assertions on `return_call`/plain `call` boundary, **GC-aware TCO for allocating fns (#549 — `$gc_sp` restore before each `return_call`)**, postcondition-fallback regression (still reverts to plain `call`), analyzer unit tests covering Block-trailing / IfExpr-both-branches / MatchExpr-arm-bodies / let-value-NOT-marked / call-args-NOT-marked / ExprStmt-statement-NOT-marked / IfExpr-condition-NOT-marked / MatchExpr-scrutinee-NOT-marked) | | `test_codegen_contracts.py` | 32 | 576 | Runtime pre/postconditions, contract fail messages, old/new state postconditions | @@ -80,7 +80,7 @@ python scripts/fix_allowlists.py --fix # auto-fix stale allowlists | `test_tester.py` | 17 | 445 | Contract-driven testing: tier classification, input generation, test execution, skip message content | | `test_tester_coverage.py` | 34 | 903 | Tester coverage gaps: String/Float64/ADT parameter input generation, Bool/Byte parameters, unsatisfiable preconditions, type expression edge cases | | `test_markdown.py` | 59 | 394 | Markdown parser: block/inline parsing, rendering, round-trips, edge cases | -| `test_lsp.py` | 94 | 1205 | LSP transport + coordinate layer (#222 Phase C) and language features (#222 Phase D): parametrized code-point↔UTF-16 goldens incl. astral-plane fixtures and surrogate-pair snapping, Span (1-based, exclusive-end) and SourceLocation (0-based col) → LSP Range conversions, point→token-range widening, DocumentStore open/change/close + index invalidation, an in-process handler-drive test, and one stdio end-to-end round-trip against the real `vera lsp` subprocess (initialize → didOpen → shutdown → exit) pinning serverInfo + textDocumentSync capabilities; plus the Phase D feature suite — parse-error single-diagnostic path, type-error verification short-circuit, tier=3 in E520 diagnostic data, per-function tier Hint synthesis (and its suppression for functions with violated obligations), smallest-enclosing-span hover, De Bruijn slot goto (most-recent-parameter jump, out-of-range None, off-slot None), and typed-hole completion (inside/after hole, away-from-hole None); plus the Phase E speculativeEdit suite — identical-text all-unchanged, breaking edit surfaces newly_undischarged (violated nat_sub) with canonical state untouched, strengthening edit surfaces newly_discharged, parse/type errors report ok:false, deleted functions report removed, proof_delta purity; plus the Phase F1 proposeEdit suite — the apply gate (clean and strengthening edits apply, breaking and non-compiling edits refuse), force overriding both gates with the delta still reported, wiring against a structural fake server (apply round-trip with exact full-document replacement range, refuse touches no canonical state, unopened-URI clamp sentinel), and full-document-range goldens (trailing-newline virtual line, UTF-16 end column); plus the Phase F2 strengthenContract suite — splice goldens (first-clause-only replacement with byte-identical remainder, ensures variant, unknown-fn None), the call-site audit pin (tightened precondition refused with newly_undischarged call_pre items, canonical state untouched), provable-ensures strengthening applies, and the three splice-target refusal paths (no analysis, unparseable document, unknown function); plus the Phase F3 addEffect suite — transitive-caller closure goldens (diamond in declaration order, leaf, unknown-fn None, recursion appears once), effect-row rewrite goldens (pure to singleton set, source-preserving append, already-present None, base-name identity blocking State next to State), diamond propagation applying one multi-site candidate with the bystander untouched, mixed append/replace rows with already-satisfied callers skipped, the fully-satisfied no-op shape, and the two refusal paths; plus the #728 instruction-contract suite — the LSP message carries description, rationale, and the Fix: paragraph (also pinning single E501 emission at the LSP surface), and a bare diagnostic maps to the description alone | +| `test_lsp.py` | 94 | 1210 | LSP transport + coordinate layer (#222 Phase C) and language features (#222 Phase D): parametrized code-point↔UTF-16 goldens incl. astral-plane fixtures and surrogate-pair snapping, Span (1-based, exclusive-end) and SourceLocation (0-based col) → LSP Range conversions, point→token-range widening, DocumentStore open/change/close + index invalidation, an in-process handler-drive test, and one stdio end-to-end round-trip against the real `vera lsp` subprocess (initialize → didOpen → shutdown → exit) pinning serverInfo + textDocumentSync capabilities; plus the Phase D feature suite — parse-error single-diagnostic path, type-error verification short-circuit, tier=3 in E520 diagnostic data, per-function tier Hint synthesis (and its suppression for functions with violated obligations), smallest-enclosing-span hover, De Bruijn slot goto (most-recent-parameter jump, out-of-range None, off-slot None), and typed-hole completion (inside/after hole, away-from-hole None); plus the Phase E speculativeEdit suite — identical-text all-unchanged, breaking edit surfaces newly_undischarged (violated nat_sub) with canonical state untouched, strengthening edit surfaces newly_discharged, parse/type errors report ok:false, deleted functions report removed, proof_delta purity; plus the Phase F1 proposeEdit suite — the apply gate (clean and strengthening edits apply, breaking and non-compiling edits refuse), force overriding both gates with the delta still reported, wiring against a structural fake server (apply round-trip with exact full-document replacement range, refuse touches no canonical state, unopened-URI clamp sentinel), and full-document-range goldens (trailing-newline virtual line, UTF-16 end column); plus the Phase F2 strengthenContract suite — splice goldens (first-clause-only replacement with byte-identical remainder, ensures variant, unknown-fn None), the call-site audit pin (tightened precondition refused with newly_undischarged call_pre items, canonical state untouched), provable-ensures strengthening applies, and the three splice-target refusal paths (no analysis, unparseable document, unknown function); plus the Phase F3 addEffect suite — transitive-caller closure goldens (diamond in declaration order, leaf, unknown-fn None, recursion appears once), effect-row rewrite goldens (pure to singleton set, source-preserving append, already-present None, base-name identity blocking State next to State), diamond propagation applying one multi-site candidate with the bystander untouched, mixed append/replace rows with already-satisfied callers skipped, the fully-satisfied no-op shape, and the two refusal paths; plus the #728 instruction-contract suite — the LSP message carries description, rationale, and the Fix: paragraph (also pinning single E501 emission at the LSP surface), and a bare diagnostic maps to the description alone | | `test_browser.py` | 102 | 1,990 | Browser parity: Python/wasmtime vs Node.js/JS-runtime output equivalence across IO, State, contracts, Markdown, Regex, and all compilable examples | | `test_conformance.py` | 445 | 102 | Parametrized conformance suite: parse, check, verify, run, format idempotency across 89 programs | | `test_prelude.py` | 24 | 422 | Prelude injection: Option/Result/array operation detection, combinator shadowing, type aliases, end-to-end compilation | diff --git a/tests/test_lsp.py b/tests/test_lsp.py index 958c25f..5605840 100644 --- a/tests/test_lsp.py +++ b/tests/test_lsp.py @@ -1181,7 +1181,12 @@ def test_message_carries_rationale_and_fix(self) -> None: message = e501[0].message assert "may violate the callee's precondition" in message assert "Fix:" in message - assert "guard the call" in message + # The fix is concrete code in call-site terms, not generic + # advice: the guard renders the actual call and the + # substituted precondition. + assert "Guard the call so the precondition holds" in message + assert "if 0 > 0 then { need_pos(0) } else { ... }" in message + assert "requires(0 > 0)" in message # The rationale paragraph travels too. assert "SMT solver could not prove" in message diff --git a/tests/test_obligations.py b/tests/test_obligations.py index f1741dd..f832b81 100644 --- a/tests/test_obligations.py +++ b/tests/test_obligations.py @@ -474,6 +474,85 @@ def test_pipe_call_violation_records_once(self) -> None: ] assert len(call_pres) == 1 + def test_e501_renders_precondition_in_call_site_terms(self) -> None: + """E501's message states the precondition with the actual + arguments substituted, and the fix shows concrete code — the + guard with the rendered call, and the requires() to add. + """ + source = ( + "private fn need_pos(@String -> @String)\n" + " requires(string_length(@String.0) > 0)\n" + " ensures(true)\n" + " effects(pure)\n" + "{\n" + " @String.0\n" + "}\n" + "\n" + "public fn caller(-> @Int)\n" + " requires(true)\n" + " ensures(true)\n" + " effects(pure)\n" + "{\n" + ' let @String = need_pos("");\n' + " 0\n" + "}\n" + ) + result = self._verify_source(source) + e501 = [d for d in result.diagnostics if d.error_code == "E501"] + assert len(e501) == 1 + d = e501[0] + assert 'At this call site: string_length("") > 0' in d.description + assert d.fix is not None + assert ( + 'if string_length("") > 0 then { need_pos("") } else { ... }' + in d.fix + ) + assert "requires(string_length(\"\") > 0)" in d.fix + + def test_e501_substitution_resolves_de_bruijn_order(self) -> None: + """Slot substitution must honour De Bruijn most-recent-first: + for callee (@Int, @Int) with requires(@Int.1 > @Int.0), + @Int.1 is parameter 1 and @Int.0 is parameter 2.""" + source = ( + "private fn cmp(@Int, @Int -> @Int)\n" + " requires(@Int.1 > @Int.0)\n" + " ensures(true)\n" + " effects(pure)\n" + "{\n" + " @Int.0\n" + "}\n" + "\n" + "public fn caller(-> @Int)\n" + " requires(true)\n" + " ensures(true)\n" + " effects(pure)\n" + "{\n" + " let @Int = cmp(1, 2);\n" + " @Int.0\n" + "}\n" + ) + result = self._verify_source(source) + e501 = [d for d in result.diagnostics if d.error_code == "E501"] + assert len(e501) == 1 + assert "At this call site: 1 > 2" in e501[0].description + + def test_e501_unmappable_slot_keeps_generic_fix(self) -> None: + """When a precondition slot cannot be mapped to an argument, + the message keeps the generic wording instead of guessing.""" + from vera import ast as A + from vera.verifier import ContractVerifier + + v = ContractVerifier.__new__(ContractVerifier) + pre = A.Requires( + expr=A.SlotRef( + type_name="String", type_args=None, index=5, span=None, + ), + span=None, + ) + call = A.FnCall(name="f", args=(), span=None) + out = v._pre_at_call_site((), call, pre) + assert out is None + def test_content_key_stable_across_runs(self) -> None: source = ( "public fn f(@Nat -> @Nat)\n" diff --git a/vera/verifier.py b/vera/verifier.py index 06e40d7..4b44c04 100644 --- a/vera/verifier.py +++ b/vera/verifier.py @@ -29,6 +29,7 @@ ProofObligation, expr_text_for, ) +from vera.slots import slot_table from vera.smt import CallViolation, SlotEnv, SmtContext from vera.types import ( BOOL, @@ -1510,6 +1511,69 @@ def _report_violation( error_code="E500", ) + def _pre_at_call_site( + self, + callee_params: tuple[ast.TypeExpr, ...], + call_node: ast.FnCall | ast.ModuleCall, + precondition: ast.Requires, + ) -> str | None: + """The precondition rendered in CALL-SITE terms, or None. + + Callee-parameter slot references are replaced by the actual + argument expressions of *call_node* (De Bruijn resolution via + :func:`vera.slots.slot_table`), and the result is rendered + with ``format_expr`` — turning, e.g., + ``requires(string_length(@String.0) > 0)`` at the call + ``f("")`` into ``string_length("") > 0``. Returns None when + any slot cannot be mapped (unknown type in the table, index + out of range, arity mismatch), in which case the caller keeps + the generic wording. + """ + import dataclasses as _dc + + table = slot_table(callee_params) + + class _NoSubstitution(Exception): + pass + + def rebuild(node: ast.Expr) -> ast.Expr: + if isinstance(node, ast.SlotRef): + positions = table.get(node.type_name) + if not positions or node.index >= len(positions): + raise _NoSubstitution + pos = positions[node.index] + if pos > len(call_node.args): + raise _NoSubstitution + return call_node.args[pos - 1] + changes: dict[str, object] = {} + for f in _dc.fields(node): + value = getattr(node, f.name) + if isinstance(value, ast.Expr): + new_value = rebuild(value) + if new_value is not value: + changes[f.name] = new_value + elif ( + isinstance(value, tuple) + and value + and all(isinstance(x, ast.Expr) for x in value) + ): + new_tuple = tuple(rebuild(x) for x in value) + if any( + a is not b for a, b in zip(new_tuple, value) + ): + changes[f.name] = new_tuple + if changes: + # dataclasses.replace's typeshed overload can't see + # the per-subclass field types through **dict. + return _dc.replace(node, **changes) # type: ignore[arg-type] + return node + + try: + rebuilt = rebuild(precondition.expr) + except _NoSubstitution: + return None + return ast.format_expr(rebuilt) + def _report_call_violation( self, caller: ast.FnDecl, @@ -1521,6 +1585,26 @@ def _report_call_violation( """Report a call site where the callee's precondition may not hold.""" pre_text = self._contract_source_text(precondition) + # Render the precondition in call-site terms (callee slots + # replaced by the actual arguments) so the message states + # exactly what could not be proven — and the fix can show + # concrete code instead of generic advice. Falls back to the + # generic wording when the callee or a slot cannot be + # resolved (e.g. module-qualified callees). + site_pre: str | None = None + callee_info = self.env.lookup_function(callee_name) + if callee_info is not None and callee_info.param_type_exprs: + from typing import cast + + site_pre = self._pre_at_call_site( + cast( + "tuple[ast.TypeExpr, ...]", + callee_info.param_type_exprs, + ), + call_node, + precondition, + ) + # Build counterexample description ce_lines: list[str] = [] if counterexample: @@ -1537,6 +1621,8 @@ def _report_call_violation( ) if pre_text: description += f"\n Precondition: {pre_text}" + if site_pre: + description += f"\n At this call site: {site_pre}" if ce_text: description += f"\n {ce_text}" @@ -1550,9 +1636,19 @@ def _report_call_violation( "its contract." ), fix=( - f"Add a precondition to '{caller.name}' or guard the call " - f"with an if-expression that ensures the callee's " - f"precondition is satisfied." + ( + f"Guard the call so the precondition holds, e.g. " + f"if {site_pre} then {{ " + f"{ast.format_expr(call_node)} }} else {{ ... }} " + f"— or strengthen '{caller.name}' with " + f"requires({site_pre})." + ) + if site_pre + else ( + f"Add a precondition to '{caller.name}' or guard " + f"the call with an if-expression that ensures the " + f"callee's precondition is satisfied." + ) ), spec_ref='Chapter 6, Section 6.4.2 "Call-Site Verification"', error_code="E501", From 5f175147d629c0eb7a794b454d398aa4975a8f49 Mon Sep 17 00:00:00 2001 From: Alasdair Allan Date: Fri, 12 Jun 2026 12:51:56 +0100 Subject: [PATCH 2/3] fix: ModuleCall guard for call-site substitution; gate tests for fold-in (#731 round 1) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The substitution now short-circuits for module-qualified callees before the bare-name lookup (a local function sharing the callee's name would supply the wrong parameter table) — enforcing the generic fallback the docstring already promised. The LSP instruction-contract test pins the 'At this call site:' description line. The changelog gate's own test suite is reworked for the fold-in rule: the frozen-history pin moves to an older-than-latest section (still rejected) and the latest-section fold-in case gains its own accepting pin. Skip-changelog: amends behaviour shipped in this PR's own fold-in bullet --- README.md | 2 +- ROADMAP.md | 2 +- TESTING.md | 6 ++-- scripts/check_changelog_updated.py | 31 +++++++++++++++--- tests/test_check_changelog_updated.py | 47 ++++++++++++++++++++------- tests/test_lsp.py | 1 + vera/verifier.py | 9 ++++- 7 files changed, 76 insertions(+), 22 deletions(-) diff --git a/README.md b/README.md index 27e301e..cb8f98f 100644 --- a/README.md +++ b/README.md @@ -210,7 +210,7 @@ cp /path/to/vera/SKILL.md ~/.claude/skills/vera-language/SKILL.md ## Project status -Vera is in **active development** at v0.0.170 — 810+ commits, 171 releases, 4,321 tests, 96% code coverage, 89 conformance programs, 35 examples, and a 13-chapter specification. See **[HISTORY.md](HISTORY.md)** for how the compiler was built. +Vera is in **active development** at v0.0.170 — 810+ commits, 171 releases, 4,322 tests, 96% code coverage, 89 conformance programs, 35 examples, and a 13-chapter specification. See **[HISTORY.md](HISTORY.md)** for how the compiler was built. The reference compiler — parser, AST, type checker, contract verifier (Z3), WASM code generator, module system, browser runtime, and runtime contract insertion — is working. The language specification is in draft across [13 chapters](spec/). diff --git a/ROADMAP.md b/ROADMAP.md index cf4f0f6..40a9a0c 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -4,7 +4,7 @@ Where the project is going. See [HISTORY.md](HISTORY.md) for what's been built ## Where we are -4,321 tests, 89 conformance programs, 35 examples, 13 spec chapters. +4,322 tests, 89 conformance programs, 35 examples, 13 spec chapters. ## What's next diff --git a/TESTING.md b/TESTING.md index c699365..fd7dcc0 100644 --- a/TESTING.md +++ b/TESTING.md @@ -6,7 +6,7 @@ This is the single source of truth for Vera's testing infrastructure, coverage d | Metric | Value | |--------|-------| -| **Tests** | 4,321 across 34 files (~55,830 lines of test code; 4,290 passed + 16 stress, 15 skipped) | +| **Tests** | 4,322 across 34 files (~55,862 lines of test code; 4,291 passed + 16 stress, 15 skipped) | | **Compiler code coverage** | 96% of 15,149 statements (CI minimum: 80%) | | **Conformance programs** | 89 programs across 9 spec chapters, validating every language feature | | **Example programs** | 35, all validated through `vera check` + `vera verify` | @@ -80,14 +80,14 @@ python scripts/fix_allowlists.py --fix # auto-fix stale allowlists | `test_tester.py` | 17 | 445 | Contract-driven testing: tier classification, input generation, test execution, skip message content | | `test_tester_coverage.py` | 34 | 903 | Tester coverage gaps: String/Float64/ADT parameter input generation, Bool/Byte parameters, unsatisfiable preconditions, type expression edge cases | | `test_markdown.py` | 59 | 394 | Markdown parser: block/inline parsing, rendering, round-trips, edge cases | -| `test_lsp.py` | 94 | 1210 | LSP transport + coordinate layer (#222 Phase C) and language features (#222 Phase D): parametrized code-point↔UTF-16 goldens incl. astral-plane fixtures and surrogate-pair snapping, Span (1-based, exclusive-end) and SourceLocation (0-based col) → LSP Range conversions, point→token-range widening, DocumentStore open/change/close + index invalidation, an in-process handler-drive test, and one stdio end-to-end round-trip against the real `vera lsp` subprocess (initialize → didOpen → shutdown → exit) pinning serverInfo + textDocumentSync capabilities; plus the Phase D feature suite — parse-error single-diagnostic path, type-error verification short-circuit, tier=3 in E520 diagnostic data, per-function tier Hint synthesis (and its suppression for functions with violated obligations), smallest-enclosing-span hover, De Bruijn slot goto (most-recent-parameter jump, out-of-range None, off-slot None), and typed-hole completion (inside/after hole, away-from-hole None); plus the Phase E speculativeEdit suite — identical-text all-unchanged, breaking edit surfaces newly_undischarged (violated nat_sub) with canonical state untouched, strengthening edit surfaces newly_discharged, parse/type errors report ok:false, deleted functions report removed, proof_delta purity; plus the Phase F1 proposeEdit suite — the apply gate (clean and strengthening edits apply, breaking and non-compiling edits refuse), force overriding both gates with the delta still reported, wiring against a structural fake server (apply round-trip with exact full-document replacement range, refuse touches no canonical state, unopened-URI clamp sentinel), and full-document-range goldens (trailing-newline virtual line, UTF-16 end column); plus the Phase F2 strengthenContract suite — splice goldens (first-clause-only replacement with byte-identical remainder, ensures variant, unknown-fn None), the call-site audit pin (tightened precondition refused with newly_undischarged call_pre items, canonical state untouched), provable-ensures strengthening applies, and the three splice-target refusal paths (no analysis, unparseable document, unknown function); plus the Phase F3 addEffect suite — transitive-caller closure goldens (diamond in declaration order, leaf, unknown-fn None, recursion appears once), effect-row rewrite goldens (pure to singleton set, source-preserving append, already-present None, base-name identity blocking State next to State), diamond propagation applying one multi-site candidate with the bystander untouched, mixed append/replace rows with already-satisfied callers skipped, the fully-satisfied no-op shape, and the two refusal paths; plus the #728 instruction-contract suite — the LSP message carries description, rationale, and the Fix: paragraph (also pinning single E501 emission at the LSP surface), and a bare diagnostic maps to the description alone | +| `test_lsp.py` | 94 | 1211 | LSP transport + coordinate layer (#222 Phase C) and language features (#222 Phase D): parametrized code-point↔UTF-16 goldens incl. astral-plane fixtures and surrogate-pair snapping, Span (1-based, exclusive-end) and SourceLocation (0-based col) → LSP Range conversions, point→token-range widening, DocumentStore open/change/close + index invalidation, an in-process handler-drive test, and one stdio end-to-end round-trip against the real `vera lsp` subprocess (initialize → didOpen → shutdown → exit) pinning serverInfo + textDocumentSync capabilities; plus the Phase D feature suite — parse-error single-diagnostic path, type-error verification short-circuit, tier=3 in E520 diagnostic data, per-function tier Hint synthesis (and its suppression for functions with violated obligations), smallest-enclosing-span hover, De Bruijn slot goto (most-recent-parameter jump, out-of-range None, off-slot None), and typed-hole completion (inside/after hole, away-from-hole None); plus the Phase E speculativeEdit suite — identical-text all-unchanged, breaking edit surfaces newly_undischarged (violated nat_sub) with canonical state untouched, strengthening edit surfaces newly_discharged, parse/type errors report ok:false, deleted functions report removed, proof_delta purity; plus the Phase F1 proposeEdit suite — the apply gate (clean and strengthening edits apply, breaking and non-compiling edits refuse), force overriding both gates with the delta still reported, wiring against a structural fake server (apply round-trip with exact full-document replacement range, refuse touches no canonical state, unopened-URI clamp sentinel), and full-document-range goldens (trailing-newline virtual line, UTF-16 end column); plus the Phase F2 strengthenContract suite — splice goldens (first-clause-only replacement with byte-identical remainder, ensures variant, unknown-fn None), the call-site audit pin (tightened precondition refused with newly_undischarged call_pre items, canonical state untouched), provable-ensures strengthening applies, and the three splice-target refusal paths (no analysis, unparseable document, unknown function); plus the Phase F3 addEffect suite — transitive-caller closure goldens (diamond in declaration order, leaf, unknown-fn None, recursion appears once), effect-row rewrite goldens (pure to singleton set, source-preserving append, already-present None, base-name identity blocking State next to State), diamond propagation applying one multi-site candidate with the bystander untouched, mixed append/replace rows with already-satisfied callers skipped, the fully-satisfied no-op shape, and the two refusal paths; plus the #728 instruction-contract suite — the LSP message carries description, rationale, and the Fix: paragraph (also pinning single E501 emission at the LSP surface), and a bare diagnostic maps to the description alone | | `test_browser.py` | 102 | 1,990 | Browser parity: Python/wasmtime vs Node.js/JS-runtime output equivalence across IO, State, contracts, Markdown, Regex, and all compilable examples | | `test_conformance.py` | 445 | 102 | Parametrized conformance suite: parse, check, verify, run, format idempotency across 89 programs | | `test_prelude.py` | 24 | 422 | Prelude injection: Option/Result/array operation detection, combinator shadowing, type aliases, end-to-end compilation | | `test_readme.py` | 2 | 79 | README code sample parsing | | `test_html.py` | 4 | 189 | HTML landing page code samples: parse, check, verify | | `test_build_site.py` | 17 | 213 | `_abs_links` unit tests: relative link rewriting, fenced block immunity (backtick and tilde fences, inline backticks inside fences), http/https/fragment pass-through, Vera effect syntax not mis-parsed | -| `test_check_changelog_updated.py` | 66 | 638 | `check_changelog_updated.py` unit + end-to-end tests: file classification (incl. file-style exact-match vs directory-style prefix-match), CHANGELOG diff parsing with `[Unreleased]` section tracking, bare-heading rejection, and full-file context (regression test for bullets far below the heading), `Skip-changelog:` trailer detection, temp-repo integration covering substantive/exempt/label/trailer paths | +| `test_check_changelog_updated.py` | 67 | 663 | `check_changelog_updated.py` unit + end-to-end tests: file classification (incl. file-style exact-match vs directory-style prefix-match), CHANGELOG diff parsing with `[Unreleased]` section tracking, bare-heading rejection, and full-file context (regression test for bullets far below the heading), `Skip-changelog:` trailer detection, temp-repo integration covering substantive/exempt/label/trailer paths | | `test_runtime_traps.py` | 66 | 2,405 | Runtime trap categorisation (#516 Stage 1), stdout/stderr-on-trap preservation (#522), `IO.print` live tee (#543), and trap source backtrace (#516 Stage 2): `_classify_trap` per-`kind` mapping (`divide_by_zero`/`out_of_bounds`/`stack_exhausted`/`unreachable`/`overflow`/`contract_violation`/`unknown`), `WasmTrapError` shape + `RuntimeError` substitutability, end-to-end `cmd_run` text + JSON envelopes including `trap_kind`, captured `stdout`, captured `stderr`, JSON-mode "no stderr leak" invariant, cross-stream code-order regression using merged `redirect_stdout`/`redirect_stderr`, the v0.0.123 tee suite (live streaming, write-count + order preservation, JSON-mode tee suppression, trap preservation invariant under tee, per-write flush count, default-execute silence), and the v0.0.124 source-mapping suite — `_resolve_trap_frames` unit tests covering user-fn / built-in / built-in-prefix / monomorphized base-name fallback / unknown-name / no-frames-attribute / leaf-first ordering preservation; end-to-end `cmd_run` text-mode + JSON-mode backtrace including the **leaf-first** ordering invariant; contract-violation backtrace in both text and JSON modes; direct `execute()` `WasmTrapError.frames` attachment; **suppression marker** for collapsed leading runtime-helper frames (mocked `vera.codegen.execute` with synthetic `is_builtin=True` leaf frames so the collapse logic is testable deterministically); source-map population for top-level fns + lifted closures (with span-value assertion against the closure literal's exact line range); and the no-builtin-leakage regression that pins built-in helpers (`alloc` / `gc_collect` / `contract_fail`) NOT being registered in `fn_source_map`; plus the v0.0.125 Stage 3 suite (`#547`) — text-mode `Fix:` block surfacing with position-ordering invariant (Fix appears after the source backtrace), text-mode block suppression for `contract_violation` (no empty header noise), JSON-mode `fix` field always-present (schema stability) including the empty-string case, `_TRAP_FIX_PARAGRAPHS` table-completeness assertion (every kind in the taxonomy has a Fix paragraph entry), and the column-wrap invariant (~76 chars max per line, two-space indent under the `Fix:` heading); plus the UTF-8 hardening suite **`TestHostPrintInvalidUtf8589`** (`#589`) — six structural decode-site assertions pinning `errors="replace"` at every UTF-8 decode path in the host runtime (`host_print` / `host_stderr` / `host_contract_fail` / `_read_wasm_string` / `vera/wasm/markdown.py::_read_string` / the String-return decoder in `execute()`), plus one synthetic-WAT end-to-end test that imports `vera.print` and calls it with raw invalid UTF-8 bytes to pin the wasmtime-trampoline contract (a Python `UnicodeDecodeError` inside a host import escapes as a "python exception" cause iff the host decode is strict); plus the Ctrl-C-during-host-import suite **`TestHostSleepKeyboardInterrupt`** ([#595](https://github.com/aallan/vera/issues/595) / [#599](https://github.com/aallan/vera/issues/599)) — after the v0.0.160 relocation to a single `except KeyboardInterrupt` handler in `execute()` (enabled by `wasmtime>=45.0.0`'s `except BaseException` trampoline fix): one structural assertion that the four per-host-import `raise _VeraExit(130)` guards are gone and the centralized handler maps to `exit_code=130`, plus two end-to-end tests that compile real Vera programs calling `IO.sleep(...)` and `IO.read_char(())`, raise `KeyboardInterrupt` from inside the blocking call, and assert the program exits with `ExecuteResult.exit_code == 130` (pre-interrupt stdout preserved) instead of a raw Python traceback escaping wasmtime's trampoline | ## Conformance Suite diff --git a/scripts/check_changelog_updated.py b/scripts/check_changelog_updated.py index d71da1e..9930c62 100644 --- a/scripts/check_changelog_updated.py +++ b/scripts/check_changelog_updated.py @@ -172,18 +172,23 @@ def is_substantive(path: str) -> bool: def _changelog_has_new_entry(base: str) -> bool: """Return True if the CHANGELOG.md diff adds a new entry. - A "new entry" is either: + A "new entry" is any of: 1. An added line introducing a new version heading (``+## [X.Y.Z]`` or ``+## [Unreleased]``). 2. An added bullet (``+- ...``) that appears *inside* the ``[Unreleased]`` section. + 3. An added bullet inside the LATEST released section (the first + version heading after ``[Unreleased]``) — the tag-move fold-in + pattern: a follow-up PR amends the most recent release's + section and the tag moves to its merge commit, so bullets + there are release-notes-bearing. Section-tracking is necessary because CHANGELOG.md commonly has prose edits (typo fixes, wording tweaks) inside released-version - entries. Those edits sometimes add bulleted lines, but they - shouldn't count as "adding a new entry" for the purposes of this - check — only bullets under ``[Unreleased]`` do. + entries. Those edits sometimes add bulleted lines; for sections + OLDER than the latest release they don't count as "adding a new + entry" — that history is frozen. Pure whitespace or cosmetic changes to CHANGELOG.md don't count. """ @@ -205,6 +210,7 @@ def _changelog_has_new_entry(base: str) -> bool: # lines (prefixed with ``+``); we update current_section on both # so the tracker reflects the post-diff state of the file. current_section: str | None = None + latest_release: str | None = None for line in diff.splitlines(): if line.startswith("+++") or line.startswith("---"): # File headers — ignore entirely. @@ -231,6 +237,14 @@ def _changelog_has_new_entry(base: str) -> bool: end = content.find("]", 4) if end > 4: current_section = content[4:end] + # The first versioned heading in file order is the + # most recent release (the diff spans the whole file + # via --unified=9999, so diff order == file order). + if ( + latest_release is None + and current_section != "Unreleased" + ): + latest_release = current_section # An *added* versioned heading (e.g. ``+## [0.0.114]``) is # itself a new entry — that's the release-cutting pattern. # But a bare ``+## [Unreleased]`` heading is just structural @@ -241,10 +255,17 @@ def _changelog_has_new_entry(base: str) -> bool: return True continue - # Added bullets only count inside the [Unreleased] section. + # Added bullets count inside [Unreleased], or inside the + # latest released section (the fold-in pattern; see the + # docstring). if line.startswith("+") and content.startswith("- "): if current_section == "Unreleased": return True + if ( + current_section is not None + and current_section == latest_release + ): + return True return False diff --git a/tests/test_check_changelog_updated.py b/tests/test_check_changelog_updated.py index 2ba9120..a4d6763 100644 --- a/tests/test_check_changelog_updated.py +++ b/tests/test_check_changelog_updated.py @@ -270,34 +270,59 @@ def test_ignores_cosmetic_changes( monkeypatch.setattr(_mod, "_run", lambda cmd: diff) assert _mod._changelog_has_new_entry("origin/main") is False - def test_bullet_outside_unreleased_does_not_count( + def test_bullet_in_older_released_section_does_not_count( self, monkeypatch: pytest.MonkeyPatch, ) -> None: - """Added bullets inside a *released* version entry are not new entries. + """Added bullets inside sections OLDER than the latest release + are not new entries — that history is frozen. - A prose fix that inserts a bullet into an existing version's - description isn't "adding a new entry" — only bullets under the - [Unreleased] section (or a new ``## [X.Y.Z]`` heading) count. - - Regression test: prior to the fix, any added ``+- `` line - counted, so editing v0.0.111's description would have satisfied - the check for any substantive change on the branch. + (Bullets in the LATEST released section DO count — the + tag-move fold-in pattern; see the companion test below.) """ diff = textwrap.dedent("""\ diff --git a/CHANGELOG.md b/CHANGELOG.md - @@ -5,7 +5,10 @@ + @@ -5,7 +5,12 @@ ## [Unreleased] + ## [0.0.112] - 2026-04-11 + + ### Added + + - The latest release's existing bullet. + ## [0.0.111] - 2026-04-10 ### Fixed + - +- Additional clarification bullet on an existing fix. + +- Additional clarification bullet on an old fix. - **SMT translator: String/Float64 parameters**... """) monkeypatch.setattr(_mod, "_run", lambda cmd: diff) assert _mod._changelog_has_new_entry("origin/main") is False + def test_bullet_in_latest_released_section_counts( + self, monkeypatch: pytest.MonkeyPatch, + ) -> None: + """Added bullets in the FIRST version section after + [Unreleased] count: the tag-move fold-in pattern amends the + most recent release's section and re-releases it.""" + diff = textwrap.dedent("""\ + diff --git a/CHANGELOG.md b/CHANGELOG.md + @@ -5,7 +5,10 @@ + ## [Unreleased] + + ## [0.0.112] - 2026-04-11 + + ### Fixed + + + +- Fold-in bullet for the re-released version. + - An existing bullet. + + ## [0.0.111] - 2026-04-10 + """) + monkeypatch.setattr(_mod, "_run", lambda cmd: diff) + assert _mod._changelog_has_new_entry("origin/main") is True + def test_bullet_under_unreleased_counts( self, monkeypatch: pytest.MonkeyPatch, ) -> None: diff --git a/tests/test_lsp.py b/tests/test_lsp.py index 5605840..6e0604a 100644 --- a/tests/test_lsp.py +++ b/tests/test_lsp.py @@ -1180,6 +1180,7 @@ def test_message_carries_rationale_and_fix(self) -> None: assert len(e501) == 1 # also pins #727 at the LSP surface message = e501[0].message assert "may violate the callee's precondition" in message + assert "At this call site:" in message assert "Fix:" in message # The fix is concrete code in call-site terms, not generic # advice: the guard renders the actual call and the diff --git a/vera/verifier.py b/vera/verifier.py index 4b44c04..bccea7c 100644 --- a/vera/verifier.py +++ b/vera/verifier.py @@ -1592,7 +1592,14 @@ def _report_call_violation( # generic wording when the callee or a slot cannot be # resolved (e.g. module-qualified callees). site_pre: str | None = None - callee_info = self.env.lookup_function(callee_name) + # Module-qualified callees are excluded outright: lookup_function + # resolves the BARE name, so a local function sharing the + # callee's name would supply the wrong parameter table. + callee_info = ( + None + if isinstance(call_node, ast.ModuleCall) + else self.env.lookup_function(callee_name) + ) if callee_info is not None and callee_info.param_type_exprs: from typing import cast From f70b9c16b0190c7406b7492862a7c4048c3bee3a Mon Sep 17 00:00:00 2001 From: Alasdair Allan Date: Fri, 12 Jun 2026 13:11:46 +0100 Subject: [PATCH 3/3] docs: changelog-gate docstring matches the bare-Unreleased logic; pin rendered substitution at the LSP layer (#731 round 2) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The docstring's claim that an added bare [Unreleased] heading counts was a pre-existing inaccuracy (the implementation has always required content beneath it); item 1 now describes versioned headings only. The LSP instruction-contract test pins 'At this call site: 0 > 0' — the rendered expression, not just the label. Skip-changelog: docstring + test-assertion amendments within this PR --- scripts/check_changelog_updated.py | 6 ++++-- tests/test_lsp.py | 2 +- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/scripts/check_changelog_updated.py b/scripts/check_changelog_updated.py index 9930c62..2a54952 100644 --- a/scripts/check_changelog_updated.py +++ b/scripts/check_changelog_updated.py @@ -174,8 +174,10 @@ def _changelog_has_new_entry(base: str) -> bool: A "new entry" is any of: - 1. An added line introducing a new version heading (``+## [X.Y.Z]`` - or ``+## [Unreleased]``). + 1. An added line introducing a new *versioned* heading + (``+## [X.Y.Z]``). A bare ``+## [Unreleased]`` heading is + structural scaffolding and does not count by itself — it only + satisfies the check via content beneath it (rule 2). 2. An added bullet (``+- ...``) that appears *inside* the ``[Unreleased]`` section. 3. An added bullet inside the LATEST released section (the first diff --git a/tests/test_lsp.py b/tests/test_lsp.py index 6e0604a..d572462 100644 --- a/tests/test_lsp.py +++ b/tests/test_lsp.py @@ -1180,7 +1180,7 @@ def test_message_carries_rationale_and_fix(self) -> None: assert len(e501) == 1 # also pins #727 at the LSP surface message = e501[0].message assert "may violate the callee's precondition" in message - assert "At this call site:" in message + assert "At this call site: 0 > 0" in message assert "Fix:" in message # The fix is concrete code in call-site terms, not generic # advice: the guard renders the actual call and the