feat: Add instrumentation hook + perf-regression canary spec#77
Merged
Conversation
A catastrophic-regression canary that catches eager-recursion class bugs
deterministically — no wall-clock assertions, no CI flake.
## What's new
**Opt-in instrumentation on `Valimock`.** Pass `instrument: true` to the
constructor to enable per-call counters exposed via
`valimock.instrumentation`:
```ts
const m = new Valimock({ instrument: true });
m.mock(schema);
console.log(m.instrumentation); // { mockCalls: 87, maxDepth: 6 }
m.resetInstrumentation();
```
- `mockCalls` — total `#mock` invocations since the last reset.
- `maxDepth` — peak recursion depth.
Off by default; production users pay no overhead. Adds one branch + two
increments per `#mock` call when on. New types `ValimockInstrumentation`
exported.
**Realistic-shape fixture library** in `src/__benchmarks__/fixtures/`.
Each fixture models a recurring shape from the discordkit codebase, the
library that drove the original v1.5.0 regression investigation:
- `messageLikeSchema` — recursive lazy through nullish (the v1.5.0
bug's worst case)
- `channelLikeSchema` — intersect-with-variant discriminated unions
- `applicationLikeSchema` — wide-flat object, no recursion (control)
- `interactionLikeSchema` — multi-lazy union with cross-references
Not exported from the public API.
**Regression spec** `src/__tests__/perfRegression.spec.ts` asserts call-
count and depth ceilings on each fixture. Averaged over 20 trials to
smooth random-roll variance.
## Why call counts, not wall-clock
The v1.5.0 wrapper-recursion bug went out because nothing in the test
suite *failed* when `mock(messageSchema)` went from 5.6ms to 26 seconds.
Property tests still passed — they just got slow.
Wall-clock budgets would have caught it but flake on CI runners under
load. Call counts are deterministic properties of the algorithm and
shift by orders of magnitude when the bug fires.
Locally verified the canary catches the bug at the simulation level:
| Workload | Healthy | Simulated eager bug |
| ------------------------- | ------: | ------------------: |
| `messageLikeSchema` calls | 84 | 177,252 |
| `messageLikeSchema` depth | 6 | 4,200 |
Spec ceiling `maxCalls < 2000` triggers at ~2,100x explosion. Comfortable
detection bandwidth.
## What's deferred
Two follow-ups from the architecture discussion:
- **Vitest `bench()` files** for iteration-time perf comparison
(concern #2: "is this change actually faster?"). The fixture library
lands now so the bench files can reuse it directly when added.
- **`scripts/profile.mjs` + `vp run profile`** convenience wrapper for
CPU profiling under `node --cpu-prof` (concern #3: "where should I
look to make this faster?"). Optional QoL; skip unless wanted.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Review or Edit in CodeSandboxOpen the branch in Web Editor • VS Code • Insiders |
Contributor
|
The changes in this PR will be included in the next version bump.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Summary
A catastrophic-regression canary that catches eager-recursion class bugs deterministically — no wall-clock assertions, no CI flake. This is concern #1 from the three-part perf-testing architecture discussion: catch the bugs that should never ship.
The v1.5.0 wrapper-recursion bug went out because nothing in the test suite failed when
mock(messageSchema)went from 5.6ms to 26 seconds. Property tests still passed — they just got slow. This PR makes that mode of regression impossible to ship undetected.messageLikeSchemacallsmessageLikeSchemadepthWhat's new
Opt-in instrumentation on
ValimockmockCalls— total#mockinvocations since the last reset.maxDepth— peak recursion depth.Off by default; production users pay zero overhead. Adds one branch + two increments per
#mockcall when on. New types:ValimockInstrumentationexported alongside.Realistic-shape fixture library
src/__benchmarks__/fixtures/— four schemas modeling recurring patterns from discordkit's codebase, the library that drove the original v1.5.0 regression investigation:messageLikeSchema— recursive lazy through nullish (the v1.5.0 bug's worst case; 26s per mock at v1.5.2)channelLikeSchema—intersect([common, variant(...)])discriminated unions (the v1.5.1 deepMerge discriminator-drop pattern)applicationLikeSchema— wide-flat object with many string fields, no recursion (control case — explosions here would point at the string pipeline, not a wrapper)interactionLikeSchema— multi-v.lazy()references with cross-schema dependenciesNot exported from the public API — internal to the test/bench surface. Designed to be reused by both this PR's regression specs and the future Vitest
bench()files (PR #2 of this trilogy).Regression spec
src/__tests__/perfRegression.spec.tsasserts call-count and depth ceilings on each fixture, averaging over 20 trials to smooth the random-roll variance from wrapper handlers (a single roll can vary by ~3× depending on how manynullishbranches happen to pick the wrapped option).Each fixture has documented baseline measurements and a ceiling at ~4-5× the max baseline. That bandwidth tolerates natural variance while catching catastrophic regressions by orders of magnitude.
Why call counts, not wall-clock
Wall-clock budgets would have caught the v1.5.0 bug but flake on CI runners under load. Call counts are deterministic properties of the algorithm and shift by orders of magnitude when the bug fires. The regression class we're protecting against is catastrophic by nature — the test doesn't need to be precise, it needs to be reliable.
Verification that the canary works
Locally simulated the v1.5.0 wrapper-recursion bug by registering
customMocks.nullishwith the eager evaluation pattern (arrayElement([this.mock(wrapped), null, undefined])— wrapped is always evaluated). Results:The spec's
expect(stats.maxCalls).toBeLessThan(2000)triggers at a ~2,100× explosion in call counts. Comfortable detection bandwidth — the bug v1.5.0 shipped would have been caught at the PR stage.What's deferred (to follow-up PRs)
Two follow-ups from the architecture discussion that share the fixture library:
bench()files (concern chore(deps-dev): Bump valibot from 0.19.0 to 0.20.0 #2: is this change actually faster?). The fixtures land in this PR so the bench files can reuse them directly. Starting work on this immediately after this PR opens.scripts/profile.mjs+vp run profileconvenience wrapper for CPU profiling undernode --cpu-prof(concern chore(deps-dev): Bump eslint-plugin-vitest from 0.3.2 to 0.3.8 #3: where should I look to make this faster?). Optional QoL.Verification
vp check --fixclean across 67 filesBump
minor— adds public API surface (instrumentoption,instrumentationaccessor,resetInstrumentation()method,ValimockInstrumentationtype export). Bumpy file included.🤖 Generated with Claude Code