Skip to content

feat(common): hash module foundation + e1 historical epoch#352

Closed
mpecan wants to merge 1 commit intomainfrom
feat/350-canonical-hash-e1
Closed

feat(common): hash module foundation + e1 historical epoch#352
mpecan wants to merge 1 commit intomainfrom
feat/350-canonical-hash-e1

Conversation

@mpecan
Copy link
Copy Markdown
Owner

@mpecan mpecan commented Apr 28, 2026

Long-term architectural fix for the hash drift behind #350. Lays the foundation for versioned canonical hashes; ships only e1 and the dispatch infrastructure so the pattern can be reviewed before the rest of the historical chain (e2..e11) is filled in mechanically.

Summary

  • Splits tokf-common::hash into a directory module:
    • hash::current — verbatim move of the existing canonical_hash(&FilterConfig). Every existing caller (tokf-cli's install_cmd, resolve, show_cmd, backfill_cmd, config/cache, publish_shared; tokf-server's routes/filters/publish) is byte-for-byte unaffected.
    • hash::epochs::e1 — frozen byte-for-byte snapshot of FilterConfig at commit `5abfaf8` (canonical_hash introduction; `GroupConfig.labels` was `BTreeMap`). Reproduces what a binary at that commit would have hashed for the same TOML.
    • hash::HashVersion / KNOWN_VERSIONS / compute_all / matches_any — public dispatch API. `HashVersion` carries a stable id (e.g. `"e1"`) and a hasher fn; clients can iterate `KNOWN_VERSIONS` to find which epoch reproduces a stored hash.
  • hash::HashError upgraded from a `serde_json::Error` newtype to a 2-variant enum (`Parse(String)`, `Serialize(String)`) so it can also carry `toml::de::Error`. Source-compatible — every caller uses `?` / `Display` / `.map_err(|e| ...)`; none destructure.
  • toml is promoted from optional (gated under `validation`) to a regular dep on `tokf-common`. All workspace crates already pull `toml` directly, so no new transitive surface.
  • Frozen-corpus CI: `crates/tokf-common/tests/hash_corpus.rs` walks `tests/hash_corpus//` for every registered version and asserts each `.toml`'s hash matches its `.expected`. Modifying an `.expected` value silently or leaving an orphan `.expected` (toml deleted, expected kept) both fail the test.

How to add the rest of the historical chain

Each subsequent epoch follows the same template:

  1. `git show :crates/tokf-common/src/config/types.rs` → drop verbatim into `hash/epochs/eN.rs` inside a private `mod schema`.
  2. Append a `HashVersion` entry to the `KNOWN_VERSIONS` slice in `hash/mod.rs`.
  3. Add corpus fixtures under `tests/hash_corpus/eN/` (the test loop auto-asserts a directory exists per registered version).
  4. Never modify an existing epoch — that silently invalidates every `eN:…` hash already in the wild.

The agent archeology that informed this PR identified the full chain (in chronological order):

epoch commit trigger
e1 (this PR) `5abfaf8` canonical_hash introduced; `GroupConfig.labels` -> `BTreeMap`
e2 `9eca37c` `+show_history_hint: bool`
e3 `87557f5` `+chunk: Vec`, `+aggregates: Vec` in OutputBranch
e4 `dd2759b` `+json: Option`
e5 `2fa1e50` `+inject_path: bool`
e6 `36d43d0` `+passthrough_args: Vec`
e7 `4418619` `+description, +truncate_lines_at, +on_empty, +tail`
e8 `494e770` RTK aliases; `MatchOutputRule.contains: String -> Option`
e9 `3f44787` `ReplaceRule { +replace_all }`
e10 `322e133` `+tree: Option`
e11 `19c4d0e` `VariantDetect { +args_pattern }`

Test plan

  • `cargo fmt --all -- --check` clean
  • `cargo clippy --workspace --all-targets -- -D warnings` clean
  • `cargo test --workspace` — 2186 passed (10 new tests across hash/mod, hash/epochs/e1, and the corpus harness)
  • Tested e1 against the bug-report filter `0585b874…` from [BUG] Community filters cannot be installed #350: e1 produces `9977a297…`, NOT a match (the filter was published at a later epoch). This is expected; `e2..e11` will reproduce it in follow-up PRs.
  • Frozen reference inline test pins `e1:2c7b698282f042f3e391f54743c292357a679019220a31ff763d81150f21798d` for `command = "git push"`.
  • Confirmed that mutating an `.expected` file in place fails CI.

Closes / refs

Refs #350 (PR #351 already shipped the immediate stopgap). Does not close — `e2..e11` and the eventual schema-independent `v1` canonical-TOML hash remain.

🤖 Generated with Claude Code

Lays the groundwork for versioned canonical hashes (architectural fix
for #350). Splits `hash.rs` into a directory module:

- `hash::current` — the existing schema-tied `canonical_hash` (verbatim
  move; behaviour byte-for-byte unchanged for all 14 callers in
  `tokf-cli` and `tokf-server`).
- `hash::epochs::e1` — frozen byte-for-byte snapshot of `FilterConfig`
  at commit 5abfaf8 (when `canonical_hash` was first introduced and
  `GroupConfig.labels` switched to `BTreeMap`). Reproduces what a
  pre-`show_history_hint`/`chunk`/`json`/`inject_path` binary would
  have hashed for the same TOML input. The schema is wrapped in a
  private `mod schema` so its types don't leak; the public API is
  `hash::epochs::e1::hash(toml: &str) -> Result<String, HashError>`.
- `hash::HashVersion` / `KNOWN_VERSIONS` / `compute_all` /
  `matches_any` — public dispatch API. Clients try every known epoch
  to verify a stored hash; the wiring into `verify_and_resolve_hash`
  lands in a follow-up once PR #351 merges.
- `hash::HashError` — promoted from a `serde_json::Error` newtype to a
  2-variant enum (`Parse(String)`, `Serialize(String)`) so it can also
  carry TOML parse errors. Source-compatible: every existing call
  site uses `?` / `Display` / `.map_err`, none destructure.

Frozen-corpus CI test (`tests/hash_corpus.rs`) walks
`tests/hash_corpus/<id>/` for every registered version and asserts
each `<n>.toml`'s hash matches its `<n>.expected`. Modifying an
expected value (or leaving an orphan `.expected`) fails the test.
Four `e1` fixtures cover minimal, BTreeMap labels, Lua scripts, and a
kitchen-sink filter exercising most e1 fields.

`toml` is promoted from optional (under `validation`) to a regular dep
on `tokf-common` because epoch parsers need it; all workspace crates
already depend on `toml` directly so no new transitive surface.

Bug-report filter `0585b874...` is NOT reproduced by `e1`
(`9977a297...` ≠ `0585b874...`); the filter was published at a later
schema epoch. `e2..e11` will land in follow-up PRs:

| epoch | commit  | trigger                                     |
|-------|---------|---------------------------------------------|
| e2    | 9eca37c | `+show_history_hint`                        |
| e3    | 87557f5 | `+chunk`, `+aggregates` in OutputBranch     |
| e4    | dd2759b | `+json`                                     |
| e5    | 2fa1e50 | `+inject_path`                              |
| e6    | 36d43d0 | `+passthrough_args`                         |
| e7    | 4418619 | `+description, +truncate_lines_at, +on_empty, +tail` |
| e8    | 494e770 | RTK aliases; `MatchOutputRule.contains` -> Option |
| e9    | 3f44787 | `ReplaceRule { +replace_all }`              |
| e10   | 322e133 | `+tree`                                     |
| e11   | 19c4d0e | `VariantDetect { +args_pattern }`           |

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@repository-butler
Copy link
Copy Markdown
Contributor

Filter Verification Report

Changed Filters

No filter files changed in this PR.

All Filters Summary

✅ 143/143 test cases passed across 51 filters


Generated by tokf verify

@mpecan mpecan closed this Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant