perf: Cache findFakerForKeyName auto-discovery results#75
Merged
Conversation
The auto-discovery walk in `findFakerForKeyName` was 21.4% of CPU self-time per the VALIMOCK_PERF_REPORT — it scanned every section of faker and invoked each candidate method to type-check the return value on every call, with no caching. For schemas with many fields whose `keyName` doesn't match a `keyNameGenerators` entry, this dominated string-heavy mock cost. Cache the result (both positive and negative) in a module-level `WeakMap<Faker, Map<string, DiscoveredFn | null>>`. The keying: - **WeakMap by Faker** so different faker instances don't share cache state, and so entries become GC-eligible when a faker is unreachable. Module-level (not per-Valimock-instance) so the common case of callers reusing `faker` from `@faker-js/faker` shares the cache across `Valimock` construction. - **Inner Map by lowercased keyName** because lookups are case- insensitive — `firstName` and `FIRSTNAME` resolve to the same fn. - **Two-state values (`fn | null`)** to support negative caching. A keyName like `totallyRandomXyzKey` that doesn't resolve must not re-scan the entire faker tree on every subsequent string field with the same name. `Map.has()` distinguishes the cached miss (`null`) from "not yet scanned" (key absent). The deprecated `mockeryMapper` path bypasses the cache: it emits a one-time-per-call deprecation warning that callers should still see, and its output depends on user-supplied state rather than a deterministic property of `faker`. Tests use a Proxy-instrumented faker stub to count `Object.keys(faker)` walks across repeated calls, so we assert on cache behavior directly rather than via flaky timing. Bench against published 1.5.3 on representative workloads: - 20-field schema hitting auto-discovery: 1.87x faster - 10-field schema with unknown keys (full scan + negative cache): 2.10x faster Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Review or Edit in CodeSandboxOpen the branch in Web Editor • VS Code • Insiders |
Contributor
|
The changes in this PR will be included in the next version bump.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Summary
Closes the largest remaining hotspot from the perf report —
findFakerForKeyName, which the report measured as 21.4% of CPU self-time and the single biggest contributor to the 1.4→1.5 string-mock regression (32× slower per the report's comparative profile). Pure-refactor optimization with no observable behavior change.What was slow
findFakerForKeyNameis the last-resort discovery walk: when a string field'skeyNamedoesn't match a directkeyNameGeneratorsentry, the function walks every section offaker, walks every method in every section, invokes each candidate to type-check its return value, and returns the first match. The report measured 13µs per call; with ~50 unmatched string keys permock(messageSchema)× up to 16 retry-loop attempts in constrained-string cases, that's roughly 10 ms of pure scan time per top-level mock.No caching meant every call paid the full cost — even for the same keyName, on the same faker instance, in the same Valimock instance. And there was no negative caching, so unknown keynames re-scanned the entire faker tree on every appearance.
What the fix does
A module-level
WeakMap<Faker, Map<string, fn | null>>caches the discovery result. Both positive results (the discovered function) and negative results (no match, cached asnull) are stored.Cache design decisions
fakerimport across many Valimock constructions; module-level lets that cache survive.firstNameandFIRSTNAMEhit the same entry.nullfor negative cache +Map.has()to distinguish "cached miss" from "not yet scanned."mockeryMapperpath bypasses the cache — the deprecated extension point emits a per-call deprecation warning that callers should still see, and its output depends on user-supplied state.Regression test
src/__tests__/findFakerForKeyNameCache.spec.tsuses a Proxy-instrumented faker stub to countObject.keys(faker)walks across repeated calls. Six tests pin:mockeryMapperpath bypasses cache (deprecation warning fires every call)Asserts on observed scan counts, not timing — robust against slow CI runners.
Verification
vp check --fixclean across 61 filesStack relationship to #74
Both PRs branched off
mainand touch different concerns (#74 is dispatch + lookup tables; this PR is the auto-discovery cache). They'll merge in any order without conflict.What's left from the report
This closes the largest remaining hotspot. One semantic item still deferred:
safeParseshort-circuit for leaf-primitive options (~3% per the report). Worth its own focused PR because it's a behavior question, not a pure refactor — starting that work now.Bump
patch— Bumpy file included.🤖 Generated with Claude Code