Skip to content

fix(server,cli): server recomputes content_hash on download to absorb FilterConfig schema drift#351

Open
mpecan wants to merge 1 commit intomainfrom
feat/350-install-hash-back-compat
Open

fix(server,cli): server recomputes content_hash on download to absorb FilterConfig schema drift#351
mpecan wants to merge 1 commit intomainfrom
feat/350-install-hash-back-compat

Conversation

@mpecan
Copy link
Copy Markdown
Owner

@mpecan mpecan commented Apr 27, 2026

Closes #350.

Summary

  • Server: GET /api/filters/<hash>/download now parses the stored TOML and recomputes canonical_hash with the current binary, returning it as content_hash in the response.
  • Client: trusts the server-provided content_hash as the authoritative identity, while still hashing the wire bytes to detect tampering between server and client. When the URL hash differs from the recomputed content_hash (i.e. the filter was published under an older schema), a one-line stderr note explains the drift and the install proceeds under the new identity.
  • Old servers that don't yet emit content_hash trigger the historical URL-hash check on the client (graceful degradation; no regression).

Why this approach

The bug-report filters (0585b874…, d2a19dc4…) cannot be repaired client-only — the URL hash was produced by an older FilterConfig schema whose exact field set we can't reconstruct from the current binary. I exhausted client-side reconstruction strategies (strip type-defaults, strip-since-initial, canonical TOML via toml::to_string on toml::Value, raw-byte hash) and none reproduce the stored hash.

Switching to "server is the trust" works because:

  • The server already has the canonical bytes in R2.
  • Recomputing on every download is cheap (one TOML parse + one SHA-256, dwarfed by R2 latency).
  • Wire-tampering detection survives: the client hashes the bytes it received and asserts they hash to the value the server claims.
  • Future schema additions stop being silent identity-breakers — the recomputed hash always matches what the current code believes.

The URL hash effectively becomes a stable lookup key; the recomputed hash is the content identity under the current schema. The architecture stays simple, no DB migration is needed, and old clients keep working unchanged against new servers.

Long-term direction (not in this PR)

canonical_hash is fundamentally fragile because it's tied to the in-memory shape of FilterConfig. The proper fix is an explicit, version-tagged canonical-TOML hash decoupled from struct evolution (e.g. v1:<sha256> over a deterministic TOML emission with sorted keys / stripped comments). Tracking as a follow-up issue.

Test plan

  • cargo fmt -- --check clean
  • cargo clippy --workspace --all-targets -- -D warnings clean
  • cargo test --workspace — 2184 passing (10 new: 3 server unit, 2 client deserialize, 5 client verify_and_resolve_hash branches)
  • Existing server integration test download_returns_toml_content extended to assert content_hash round-trips
  • Pre-commit hook (file-size limits, README regeneration) passed

Branch coverage of verify_and_resolve_hash

  • server hash present + matches client + URL matches server hash → silent success
  • server hash present + matches client + URL ≠ server hash → success with stderr note (this is the [BUG] Community filters cannot be installed #350 happy path)
  • server hash present + ≠ client → wire-tamper error (assert both hashes appear in message)
  • server hash absent + URL == client → success (old-server happy path)
  • server hash absent + URL ≠ client → error referencing [BUG] Community filters cannot be installed #350 (old-server, no help possible)

Deployment ordering note

This is a coordinated fix:

  • New client + new server → bug-report filters install successfully.
  • Old client + new server → unchanged (old client ignores the new field).
  • New client + old server → falls back to URL hash check; affected filters still fail (no regression).
  • Old client + old server → status quo.

The fix lands the moment tokf.net is updated to serve content_hash. No client republish needed.

🤖 Generated with Claude Code

… FilterConfig schema drift

Closes #350.

`tokf install <hash>` was failing with "filter hash mismatch — the server
may have returned tampered content" for filters published before recent
`FilterConfig` schema additions (e.g. `inject_path`, added 2026-03-07 in
2fa1e50). Each new field with `#[serde(default)]` silently changes the
output of `canonical_hash` for every same-TOML filter that doesn't
reference the new field, breaking the hash that was stored at publish
time.

Investigation ruled out client-side reconstruction strategies: stripping
type-default fields from the JSON, stripping known-since-initial fields,
emitting canonical TOML via `toml::to_string(toml::Value)` — none
reproduce the URL hash for the two reported filters. The original
shape can't be recovered from the current binary.

The server is the trust authority: on `GET /api/filters/<hash>/download`,
it now parses the stored TOML and recomputes `canonical_hash` with the
current binary, returning it as `content_hash` in the response. The URL
hash becomes a stable lookup key; the recomputed `content_hash` is the
authoritative identity under the current schema.

The client trusts the server's `content_hash`, but still hashes the wire
bytes and asserts they match — preserving wire-tampering detection
between server and client. When the user-requested URL hash differs
from the recomputed `content_hash`, the client emits a one-line stderr
note explaining the schema drift; the install proceeds under the
recomputed identity.

Old servers that don't yet emit `content_hash` fall through to the
historical URL-hash check (graceful degradation); behaviour matches
today against the upgrade matrix.

Long-term: define an explicit, version-tagged canonical-TOML hash so
future schema additions don't silently invalidate stored identities.
Tracked separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@repository-butler
Copy link
Copy Markdown
Contributor

Filter Verification Report

Changed Filters

No filter files changed in this PR.

All Filters Summary

✅ 143/143 test cases passed across 51 filters


Generated by tokf verify

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Community filters cannot be installed

1 participant