refactor(output): pin schema_version + int/float wire invariants (closes #152, #153) by CiroGamboa · Pull Request #174 · Toloka/tolokaforge

CiroGamboa · 2026-07-05T01:01:03Z

TL;DR

Closes #152 and #153. Both are wire-format invariants the future writer migration (models replacing dicts in RunAggregateWriter) will depend on. Landing them now — before the migration — means the migration PR only has to verify against tests that already exist.

What ships

1. `schema_version` always survives dumps (#152)

RunAggregate.schema_version gains a @model_serializer(mode="wrap") that unconditionally injects the field into the dumped dict when the wrapped handler omitted it — including the exclude_unset=True case where the field was left at its Pydantic default.

Why: the current writer path (json.dump(dict)) relies on the orchestrator manually stamping aggregate["schema_version"] = 1 before write. Once the writer migrates to model.model_dump_json(exclude_unset=True), a RunAggregate whose schema_version was left at the model default would silently drop the envelope field — every downstream dashboard that dispatches on schema_version would break. The serializer closes that gap without requiring callers to remember the invariant. The default stays 1 and can still be passed explicitly.

2. Type-preserving numeric fields (#153)

Numeric fields with naturally-integer semantics on PerTaskMetrics and AggregateMetrics change from float to int | float. Rate-shaped fields (success_rate, pass@k, avg_score, and their _micro / _macro aggregates) stay narrow float.

Fields widened (natural integer inputs):

Token counts: avg_*_tokens, total_*_tokens on both models
Latency percentiles: latency_p50/90/99_s, api_call_latency_p50/90/99_s, latency_p50/90/99_s_macro
Cost sums: total_cost_usd, avg_cost_usd, judge_cost_usd, total_cost_incl_judge_usd
Turn / tool-call counts: avg_turns, avg_tool_calls, avg_latency_s
stuck_rate (empirical count, not a rate)

Fields kept narrow (always division results at the producer):

PerTaskMetrics.success_rate, avg_score, pass@k, pass_hat@k
AggregateMetrics.success_rate_micro/macro, avg_score_micro/macro, pass_at_*_macro, pass_hat_at_*_macro

Rationale: Pydantic v2's smart-union picks the more specific type on validation, so a source int 42 round-trips to 42 (not 42.0) in the JSON dump. A pre-fix float-only model coerces int 42 → 42.0 on validation and emits 42.0 on the wire — invisible to Python-dict comparison (42 == 42.0) but real when downstream tooling diffs the JSON byte-for-byte.

Honesty note: today's producers in metrics.py happen to always emit float for every widened field (both metric-calc functions short-circuit on empty input rather than dividing by zero). The union is defensive future-proofing — if a future refactor introduces an int return path (e.g. a counting aggregator that returns 0 for empty), the model preserves it without a coordinated model change.

3. Writer docstring explicitness

RunAggregateWriter.write_aggregate documents both invariants in its docstring so future consumers can rely on them without needing to read the model source.

Tests

Four canonical tests pin the invariants (14 total on the file after this PR, up from 10):

test_schema_version_survives_exclude_unset — constructs RunAggregate from a payload that omits schema_version, dumps with exclude_unset=True, asserts the field is present in the output. Also verifies the JSON round-trip preserves the value.
test_int_valued_numeric_fields_preserve_int_type — hand-injects int values into every widened field of PerTaskMetrics and asserts the model preserves int at both the Python level (isinstance(value, int)) and the JSON wire level. Would fail if any widened field is narrowed back to float.
test_int_valued_aggregate_fields_preserve_int_type — same invariant for AggregateMetrics. Also asserts that success_rate_micro STAYS narrow float — pinning the symmetry with PerTaskMetrics.success_rate.
test_current_producer_output_matches_model_dump_byte_for_byte — round-trip guard for the current live producer output across weighted and unweighted aggregates. Catches accidental coercion in the narrow rate fields.

Tests exercise both branches of the int | float union — the producer's actual float output round-trips as float, and hand-injected int inputs round-trip as int.

Test plan

make lint clean.
uv run pytest tests/canonical/test_run_aggregate_models_snapshot.py -v — 14 pass.
uv run pytest tests/ -m "unit or canonical" -q — 2449 pass.
Consumer safety: grep for PerTaskMetrics|AggregateMetrics|RunAggregate usage across tolokaforge/ and tests/ — zero live consumers today (the models exist alongside the current dict-typed writer), so the widening is safe to land ahead of the migration.
Automated Claude review round.

Review-driven follow-ups (this PR)

Local review of the initial commit flagged three real issues, all addressed in the second commit:

Widening was inconsistent — AggregateMetrics.success_rate_micro/macro, avg_score_micro/macro, pass_at_*_macro were widened but their PerTaskMetrics siblings stayed float. Narrowed back for symmetry.
The int/float test didn't exercise int inputs — the producer path only ever emits float today. Rewrote with hand-injected int values so the test would actually fail if any widened field regresses to float.
Unused info parameter on the serializer — dropped along with the SerializationInfo import.

What this PR does NOT do

No writer migration. The writer's Protocol signature still takes dict[str, Any], and the FileAggregateWriter still serialises via json.dump(payload). This PR just pins the invariants the migration will inherit.
No schema_version bump. The wire format is unchanged — the int | float unions preserve the current dict path's output byte-for-byte for any producer that emits int.
No changes to the metric-calc functions. metrics.py and failure_attribution.py still return dict[str, Any].

Closes

Closes #152 (schema_version survives exclude_unset).
Closes #153 (int/float dump does not diverge).

#152, #153) Two stage-2 follow-ups to PR #149's typed aggregate models. Both pin wire-format invariants the future writer migration will depend on. schema_version survives exclude_unset=True (#152). The current writer path (json.dump(dict)) relies on the orchestrator manually stamping aggregate["schema_version"] = 1 before write. Once the writer migrates to model.model_dump_json(exclude_unset=True), a RunAggregate whose schema_version was left at the model default would silently drop the envelope field — every downstream dashboard that dispatches on schema_version would break. Fix: @model_serializer(mode="wrap") on RunAggregate unconditionally injects schema_version into the dumped dict, regardless of dump options. The field's default stays 1; callers can still pass it explicitly. Idiomatic Pydantic v2 pattern. Numeric fields preserve source type on JSON (#153). The producer in metrics.py emits sum([]) == 0 (int) for empty-aggregate total_* / avg_* fields and float otherwise. A pre-fix model with float fields coerces 0 → 0.0 on validation, and model_dump_json emits 0.0 where the source dict emits 0 — a wire-format break invisible to Python-dict comparison (0 == 0.0) but real when downstream tooling diffs the JSON byte-for-byte. Fix: change numeric fields on PerTaskMetrics and AggregateMetrics from float to int | float unions. Pydantic v2's smart-union picks the more specific type on validation, so a source int round-trips as int and a source float as float. Defaults change from 0.0 to 0 to match the empty-aggregate source. Fields staying float are those that come from a division (success_rate, pass@k, avg_score) — always float-valued from the producer. Writer docstring on RunAggregateWriter.write_aggregate now documents both invariants explicitly so future consumers can rely on them without needing to read the model source. Two new canonical tests in tests/canonical/test_run_aggregate_models_snapshot.py: - test_schema_version_survives_exclude_unset — constructs RunAggregate from a payload that omits schema_version, dumps with exclude_unset=True, asserts the field is present in the output. Also verifies the JSON round-trip preserves the value. - test_int_float_json_string_no_drift — for each of the four writer artifacts, produces a real dict from the metric-calc functions on an empty-aggregate trajectory (the harshest case for int/float drift), feeds it through the model, and compares byte-canonical JSON strings between the dict path and the model path. Divergences fail loud with the exact source/model JSON diff. Verification: - make lint clean. - uv run pytest tests/canonical/test_run_aggregate_models_snapshot.py -v — all 12 tests pass (10 existing + 2 new). - uv run pytest tests/ -m "unit or canonical" -q — 2447 pass. - Grep confirms zero downstream consumers reference the model classes directly today; the int | float widening is safe.

github-actions · 2026-07-05T01:01:14Z

Claude finished @CiroGamboa's task —— View job

PR Review — AGENTS.md compliance

Read AGENTS.md rules
Read changed files (aggregate_models.py, aggregates.py, test file)
Cross-check producer (metrics.py) for int/float claims + live consumers
Evaluate against rules

Verdict: ✅ Clean. No AGENTS.md violations found. This is a well-scoped, well-documented stage-1 infra PR.

What I verified

1. Errors surfaced explicitly (Core Rule 1 / Code Standard 1) — ✅

No except Exception: pass, no silent None-returns, no swallowing fallbacks in the diff.
The _always_include_schema_version serializer (aggregate_models.py:223-232) is additive, not error-masking — it guarantees a field's presence rather than hiding a failure. The isinstance(data, dict) guard is a legitimate type-narrow for the mode="wrap" handler contract, not a swallowed error.

2. Secrets single-abstraction (HARD RULE) — ✅ N/A

Zero credential access, no os.environ/os.getenv, no load_dotenv, no .env/.netrc reads. Nothing touches the secrets surface.

3. Testing hygiene — ✅

pytestmark = pytest.mark.canonical present (test_...snapshot.py:58).
No xfail, no bare @skip, no local MockAsyncClient copies. New tests invoke real producer functions (calculate_task_metrics, calculate_aggregate_metrics, attribute_failure) — behavior-testing over mocks, matching AGENTS.md "test behavior, not code".

4. Architecture / type-system choice — ✅

Pydantic BaseModel with extra="forbid" for a serialization-boundary payload is exactly the AGENTS.md "Type system choices" prescription (output bundle / snapshot → Pydantic v2). Harness stays generic; no task-specific logic leaks in.

5. Repo hygiene / root cleanliness / scripts / temp artifacts — ✅

Only edits existing files under tolokaforge/core/output/ and tests/canonical/. No root files, no scripts, no data dumps, no project-specific content.

6. DRY / self-describing / nesting — ✅

Serializer is 4 lines, depth 1. int | float widening is applied uniformly. Names are self-describing. No duplicated logic, no suppressed warnings.

Correctness confirmation (I checked the producer, not just the PR text)

The int | float union rationale holds up against the source:

metrics.py:315-317 — total_{field} = sum(m["avg_{field}"] * m["total_trials"] ...). On a populated run this is float; on an empty task_metrics list it's sum([]) == 0 (int). So the widening + 0 default is genuinely required for byte-parity, not cosmetic. ✅
Fields kept as float (success_rate, avg_score, pass@k, percentiles) all originate from division or 0.0 literals in metrics.py (lines 150, 158-159, 293-300, 344-346) — so they're always float-valued. Keeping them float is correct and avoids over-widening. ✅

Non-blocking observations (optional, not violations)

aggregate_models.py:97-100 — avg_score sits in the int | float block but the producer (metrics.py:158-159) always emits float (division or 0.0). It's harmless (smart-union still picks float), but if you want the type to document producer intent precisely, avg_score could stay float like success_rate. The PR body actually lists avg_score as a float-staying field, so this may be an oversight vs. the stated design. Worth a quick reconcile between the docstring/PR-claim and the field list.
test_...snapshot.py:405 — import json is function-local. The file already imports at top; a module-level import json would be marginally cleaner, but local imports aren't an AGENTS.md violation.
Consumer-safety claim confirmed: grep shows the models are imported only by aggregate_models.py, the test, and output/__init__.py re-export — orchestrator.py still uses RunAggregateWriter (the dict-based Protocol), so the int | float widening lands with zero live consumers. Safe to ship ahead of the writer migration. ✅

Observation (1) is the only thing I'd suggest reconciling before merge, and it's cosmetic — the tests pass either way.

Job run • Branch: refactor/aggregate-models-wire-invariants

…e int inputs Local review pointed out three real issues on the previous commit: 1. **Widening was inconsistent.** ``AggregateMetrics.success_rate_micro/ macro``, ``avg_score_micro/macro``, and ``pass_at_*_macro`` were widened to ``int | float | None`` in the previous commit, but every one of them is ``sum(...) / n`` in metrics.py — always float at the producer, matching the ``PerTaskMetrics.success_rate``/``avg_score``/ ``pass@k`` siblings that stayed narrow ``float``. Narrowed the AggregateMetrics rate fields back to ``float | None`` for symmetry. Fields that stay widened (``total_*_tokens``, ``avg_*_tokens``, latency percentiles, ``stuck_rate``, cost sums) are the ones with naturally-integer semantics. 2. **The `int/float` test didn't exercise int inputs.** The previous ``test_int_float_json_string_no_drift`` fed ``calculate_task_metrics([single_trajectory])`` and compared JSON strings. But every widened field in that path resolves to ``float`` via division — the pre-fix ``float``-only model produces byte-identical output. The test guarded nothing. Replaced with three sharper tests: - ``test_int_valued_numeric_fields_preserve_int_type`` — hand-injects ``int`` values into every widened field of ``PerTaskMetrics`` and asserts the model preserves ``int`` type at both the Python level (``isinstance(value, int)``) and the JSON wire level (dump keys are ``int``, not ``float``). Would fail if any widened field is narrowed back to ``float``. - ``test_int_valued_aggregate_fields_preserve_int_type`` — same invariant for ``AggregateMetrics``. Also asserts that ``success_rate_micro`` STAYS narrow ``float`` — pinning the symmetry with ``PerTaskMetrics``. - ``test_current_producer_output_matches_model_dump_byte_for_byte`` — round-trip guard for the current live producer output across weighted and unweighted aggregates. Catches accidental coercion in the narrow rate fields. 3. **Unused ``info`` parameter** on ``_always_include_schema_version``. Dropped along with the ``SerializationInfo`` import. Clarified the docstring: the serializer injects only when the wrapped handler omits the field (i.e. under ``exclude_unset=True`` with the default), never overwrites a caller-set value. Also updated the module docstring to spell out which fields ARE widened (``int``-natural: tokens, latency, stuck) vs. which stay narrow (rate-shaped divisions) and acknowledge that this is defensive future-proofing — today's producer path always emits float for every widened field because the metric-calc functions short-circuit on empty input. Verification: 14 tests pass on the snapshot file (up from 12); full unit + canonical suite: 2449 pass.

github-actions · 2026-07-05T01:14:00Z

Claude finished @CiroGamboa's task —— View job

PR Review — AGENTS.md compliance

Read AGENTS.md rules
Read changed files (aggregate_models.py, aggregates.py, test)
Cross-check producer (metrics.py) for int/float claims + live consumers
Evaluate against all 12 rule categories

Verdict: ✅ Clean. No AGENTS.md violations found. Well-scoped stage-1 infra PR. The prior review's one open observation (the avg_score type/docstring mismatch) is now reconciled — see below.

Rule-by-rule

#	Rule	Result
1	Errors surfaced explicitly	✅ No `except: pass`, no silent `None`-returns. The `_always_include_schema_version` serializer (`aggregate_models.py:239-253`) is additive — it guarantees field presence, not error-masking. The `isinstance(data, dict)` guard is the correct type-narrow for the `mode="wrap"` handler contract. Nesting depth 1, function ~4 lines.
2	Secrets — single abstraction (HARD)	✅ N/A. Zero credential surface: no `os.environ`/`getenv`, no `load_dotenv`, no `.env`/`.netrc` reads. `test_no_raw_secret_access.py` stays green.
3	Testing hygiene	✅ `pytestmark = pytest.mark.canonical` (`test_...snapshot.py:58`). No `xfail`, no bare `@skip`, no local `MockAsyncClient`. New tests drive real producers (`calculate_task_metrics`, `calculate_aggregate_metrics`, `attribute_failure`) — behaviour over mocks.
4	Architecture / generic harness	✅ Pydantic `BaseModel` + `extra="forbid"` for a serialisation boundary is exactly the AGENTS.md "Type system choices" prescription. No task-specific logic leaks in.
5 / 7 / 8	Repo hygiene / root / temp artifacts	✅ Edits only existing files under `core/output/` + `tests/canonical/`. No root files, no scripts, no dumps.
6	DRY / self-describing / nesting	✅ Uniform `int \| float` widening, self-describing names, no suppressed warnings.
9–12	Scripts / tools / project-specific-on-main	✅ N/A — none touched.

Correctness confirmation (verified against the producer, not just PR text)

int widening is genuinely required. metrics.py:315 — agg[f"total_{_field}"] = sum(m.get(...) * m["total_trials"] ...). On an empty task_metrics list this is sum([]) == 0 (int). Same for cost sums (metrics.py:324, :336, :342): sum(...) over empty → int 0. A pre-fix float field coerces 0 → 0.0, diverging byte-for-byte from the dict path. Widening + 0 default is correct. ✅
Rate-shaped fields correctly stay narrow float — success_rate, avg_score, avg_score_micro/macro, success_rate_*, pass_at_*_macro all originate from sum(...) / n divisions or 0.0 literals (metrics.py:150,158,280,288,294,346). Keeping them float avoids over-widening. ✅
Prior-review avg_score observation resolved. avg_score is int | float at aggregate_models.py:111 with a matching docstring — the class docstring (:33-42) now lists only success_rate / pass@k / avg_score aggregates as narrow float, and avg_score on AggregateMetrics (:169,174) is narrow float. The PerTaskMetrics.avg_score union is harmless (smart-union still picks float from the /n producer) and no longer contradicts the stated design. ✅
schema_version invariant holds. Serializer injects only when absent (:251) — no forced overwrite of a caller-set value; test_schema_version_survives_exclude_unset covers both dump and JSON round-trip. ✅
Consumer safety confirmed. Grep for the seven model names across the repo returns only aggregate_models.py, the test, and the aggregates.py docstring cross-refs (:70,74 — comments, not imports). RunAggregateWriter / FileAggregateWriter still take dict[str, Any]. Zero live consumers → the int \| float widening lands safely ahead of the writer migration. ✅

Non-blocking nits (optional, not violations)

test_...snapshot.py:407 & :547 — import json is function-local while the module already imports at top; both would read marginally cleaner hoisted. Not an AGENTS.md violation.
aggregate_models.py:117-122 — the six avg_*_tokens fields carry int | float while metrics.py:318 produces them via sum(...) / n (float or 0.0). Like avg_score, the union is harmless but slightly over-wide vs. the divide-producer. Purely cosmetic — the empty-input int 0 path (total_*_tokens) is the one that genuinely needs the union.

Neither nit blocks merge; tests pass either way.

Job run • Branch: refactor/aggregate-models-wire-invariants

Follow-up on the review nit — the two `import json` statements inside `test_int_valued_numeric_fields_preserve_int_type` and `test_current_producer_output_matches_model_dump_byte_for_byte` are now a single top-of-module `import json` (the rest of the file already imports at module level). Purely cosmetic; no behaviour change.

github-actions · 2026-07-05T13:44:33Z

Claude finished @CiroGamboa's task —— View job

PR Review — AGENTS.md compliance

Read AGENTS.md rules
Read changed files (aggregate_models.py, aggregates.py, test file)
Cross-check producer (metrics.py) for int/float claims + live consumers
Evaluate against all 12 rule categories

Verdict: ✅ Clean. No AGENTS.md violations found. Well-scoped stage-1 wire-invariant PR; the prior two review rounds' open observations are all reconciled in the current diff.

Rule-by-rule

#	Rule	Result
1	Errors surfaced explicitly	✅ No `except: pass`, no silent `None`-returns. `_always_include_schema_version` (`aggregate_models.py:239-253`) is additive — guarantees field presence, not error-masking. The `isinstance(data, dict)` guard is the correct type-narrow for the `mode="wrap"` handler contract. Function ~4 lines, nesting depth 1.
2	Secrets — single abstraction (HARD)	✅ N/A. Zero credential surface: no `os.environ`/`getenv`, no `load_dotenv`, no `.env`/`.netrc` reads. `test_no_raw_secret_access.py` unaffected.
3	Testing hygiene	✅ `pytestmark = pytest.mark.canonical` (`test_...snapshot.py:59`). No `xfail`, no bare `@skip`, no local `MockAsyncClient`. New tests drive real producers (`calculate_task_metrics`, `calculate_aggregate_metrics`, `attribute_failure`) — behaviour over mocks.
4	Architecture / generic harness	✅ Pydantic `BaseModel` + `extra="forbid"` for a serialisation boundary is exactly the AGENTS.md "Type system choices" prescription (output bundle/snapshot → Pydantic v2). No task-specific logic leaks in.
5 / 7 / 8	Repo hygiene / root / temp artifacts	✅ Edits only existing files under `core/output/` + `tests/canonical/`. No root files, scripts, or dumps.
6	DRY / self-describing / nesting	✅ Uniform `int \| float` widening, self-describing names, no suppressed warnings.
9–12	Scripts / tools / project-specific-on-main	✅ N/A — none touched.

Correctness confirmation (verified against the producer, not just PR text)

int widening is genuinely required. metrics.py:315-317 — agg[f"total_{_field}"] = sum(m.get(...) * m["total_trials"] ...); on an empty task_metrics list sum([]) == 0 (int). Cost sums total_cost_usd / judge_cost_usd / total_cost_incl_judge_usd (metrics.py:324,336,342) are sum(list) → int 0 on empty. A pre-fix float-only field would coerce 0 → 0.0, diverging byte-for-byte from the dict path. Widening + 0/None default is correct. ✅
Rate-shaped fields correctly stay narrow float. success_rate (:150), avg_score (:158), success_rate_micro/macro + avg_score_micro/macro (:256,261,268,271), pass_at_*_macro (:280,288), latency _macro (:345) all originate from sum(...) / n divisions or 0.0 literals. Keeping them float avoids over-widening. ✅
Prior-review symmetry concern resolved. AggregateMetrics.success_rate_micro/macro + avg_score_micro/macro are narrow float (aggregate_models.py:168-174), matching their PerTaskMetrics siblings. test_int_valued_aggregate_fields_preserve_int_type (:530) actively pins success_rate_micro STAYS float. ✅
schema_version invariant holds. Serializer injects only when absent (:251) — no overwrite of a caller-set value. test_schema_version_survives_exclude_unset covers dump + JSON round-trip. ✅
Consumer safety confirmed. Grepped all seven model names + aggregate_models across tolokaforge/. orchestrator.py matches only on RunAggregateWriter / metric-field names — not the models (zero import). Live import sites are aggregate_models.py, the output/__init__.py re-export, and the test. RunAggregateWriter / FileAggregateWriter still take dict[str, Any]. The int \| float widening lands with zero live consumers → safe ahead of the writer migration. ✅

Non-blocking nits (optional, not violations)

test_...snapshot.py:469,548 — import json is now hoisted to module level (:26), and these are json.dumps call sites, not local re-imports. The earlier reviews' "function-local import json" nit is resolved. ✅ (No action needed.)
aggregate_models.py:111-114,117-122 — avg_score and the six avg_*_tokens carry int | float while their producers (metrics.py:158,318) always divide (float/0.0). Harmless — smart-union still picks float — but slightly wider than the divide-producer strictly needs. The genuinely-int-capable path is the total_* empty-input sum([]) == 0 case. Purely cosmetic; the class docstring (:33-42) accurately describes the intended split, so no contradiction remains.

Neither nit blocks merge; tests pass either way.
• Branch: refactor/aggregate-models-wire-invariants

CiroGamboa merged commit 5c7da0c into main Jul 5, 2026
8 checks passed

CiroGamboa deleted the refactor/aggregate-models-wire-invariants branch July 5, 2026 22:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(output): pin schema_version + int/float wire invariants (closes #152, #153)#174

refactor(output): pin schema_version + int/float wire invariants (closes #152, #153)#174
CiroGamboa merged 3 commits into
mainfrom
refactor/aggregate-models-wire-invariants

CiroGamboa commented Jul 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jul 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jul 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jul 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

CiroGamboa commented Jul 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

What ships

1. schema_version always survives dumps (#152)

2. Type-preserving numeric fields (#153)

3. Writer docstring explicitness

Tests

Test plan

Review-driven follow-ups (this PR)

What this PR does NOT do

Closes

Uh oh!

github-actions Bot commented Jul 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review — AGENTS.md compliance

What I verified

Correctness confirmation (I checked the producer, not just the PR text)

Non-blocking observations (optional, not violations)

Uh oh!

github-actions Bot commented Jul 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review — AGENTS.md compliance

Rule-by-rule

Correctness confirmation (verified against the producer, not just PR text)

Non-blocking nits (optional, not violations)

Uh oh!

github-actions Bot commented Jul 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review — AGENTS.md compliance

Rule-by-rule

Correctness confirmation (verified against the producer, not just PR text)

Non-blocking nits (optional, not violations)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CiroGamboa commented Jul 5, 2026 •

edited

Loading

1. `schema_version` always survives dumps (#152)

github-actions Bot commented Jul 5, 2026 •

edited

Loading

github-actions Bot commented Jul 5, 2026 •

edited

Loading

github-actions Bot commented Jul 5, 2026 •

edited

Loading