fix(cache): cache repeated evals by repeat index#8480
Conversation
…-cache-namespaces-1431
There was a problem hiding this comment.
👍 All Clear
I reviewed the PR changes related to cache namespacing and repeat-aware caching in evaluator and cache utilities. The updates introduce AsyncLocalStorage-backed namespaces and adjust when caching is disabled, but do not change prompt construction, tool capabilities, or execution paths. Based on tracing and analysis, no LLM-security vulnerabilities were identified in this PR.
Minimum severity threshold: 🟡 Medium | To re-scan after changes, comment @promptfoo-scanner
Learn more
There was a problem hiding this comment.
Pull request overview
This PR enables caching when evaluateOptions.repeat > 1 by scoping cache access to a repeat-specific namespace, so each repeat index gets its own cached entries while preserving provider cache behavior.
Changes:
- Stop globally disabling cache when
repeat > 1(CLI + programmatic entrypoints). - Introduce cache namespacing (AsyncLocalStorage-backed) and apply it to evaluator execution per repeat index.
- Add regression tests and documentation for repeat-aware caching behavior.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/cache.ts |
Adds cache namespace support and scopes fetch-cache keys/in-flight dedupe by namespace. |
src/evaluator.ts |
Runs eval and extension hooks inside a repeat-index cache namespace and removes repeat-based cache busting. |
src/index.ts |
Keeps cache enabled when repeat > 1 unless explicitly disabled. |
src/commands/eval.ts |
Keeps cache enabled when repeat > 1 unless explicitly disabled. |
test/cache.test.ts |
Expands cache mocks and adds tests for namespace isolation (direct/bulk/clear/in-flight). |
test/evaluator.test.ts |
Updates repeat behavior expectation and adds repeat-aware cache isolation regressions. |
test/evaluator.integration.transforms.test.ts |
Updates cache mock to include withCacheNamespace. |
test/commands/eval/evaluateOptions.test.ts |
Updates expectation/comment to reflect repeat no longer disabling cache. |
site/docs/guides/evaluate-llm-temperature.md |
Documents repeat-index-specific caching and how to disable it. |
site/docs/configuration/caching.md |
Documents repeat-index cache namespaces and guidance for --no-cache with --repeat. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
test/cache.test.ts
Outdated
| }), | ||
| reset: vi.fn(), | ||
| ttl: vi.fn(), | ||
| ttl: vi.fn().mockImplementation((key: string) => Promise.resolve(expiresAt.get(key))), |
There was a problem hiding this comment.
The cache-manager mock’s ttl() implementation returns the absolute expiration timestamp (expiresAt.get(key)), but cache-manager’s ttl() API is intended to return the remaining TTL duration (or undefined/null when not set). This mismatch can make TTL-related tests pass while diverging from real behavior; consider returning Math.max(0, expiresAt.get(key) - Date.now()) (and undefined when absent).
| ttl: vi.fn().mockImplementation((key: string) => Promise.resolve(expiresAt.get(key))), | |
| ttl: vi.fn().mockImplementation((key: string) => { | |
| const expiry = expiresAt.get(key); | |
| if (expiry === undefined) { | |
| return Promise.resolve(undefined); | |
| } | |
| return Promise.resolve(Math.max(0, expiry - Date.now())); | |
| }), |
📝 WalkthroughWalkthroughThe pull request implements cache namespacing via AsyncLocalStorage to enable caching when Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes DetailsThe changes introduce new async context management patterns ( 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/cache.ts (1)
391-392: Minor:getCacheInstance()is synchronous but awaited.
getCacheInstance()returnsCachedirectly (not a Promise), but it's being awaited on Line 392. This works but is unnecessary.♻️ Suggested fix
const cacheKey = getScopedCacheKey(`fetch:v2:${url}:${JSON.stringify(copy)}`); - const cache = await getCacheInstance(); + const cache = getCacheInstance();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cache.ts` around lines 391 - 392, The code awaits getCacheInstance() even though it returns a Cache synchronously; remove the unnecessary await so that cache is assigned directly from getCacheInstance() (update the line referencing getCacheInstance and any nearby usage of cacheKey/getScopedCacheKey if needed). Ensure no other call sites rely on it being a Promise and run tests to confirm behavior.src/evaluator.ts (2)
1649-1657: Keep the repeat cache boundary in one place.
processEvalStep()already enters the per-repeat namespace, so callingrunEval()here re-enters the same scope. CallingrunEvalInternal(evalStep)from this path would make the active namespace easier to reason about and avoid future drift if the namespace helper changes.♻️ Possible simplification
- const rows = await runEval(evalStep); + const rows = await runEvalInternal(evalStep);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/evaluator.ts` around lines 1649 - 1657, processEvalStep currently already enters the per-repeat cache namespace via withCacheNamespace(getRepeatCacheNamespace(...)), so inside that block you should call runEvalInternal(evalStep) instead of runEval(evalStep) to avoid re-entering the same namespace; update the call in the withCacheNamespace callback (replace runEval with runEvalInternal) and ensure runEvalInternal is visible here (import/adjust export if necessary) so the active namespace remains consistent for evalStep.repeatIndex/evaluateOptions.
1715-1734: Use structured logger context for these new error paths.Both error logs serialize details into the message string, which skips the logger's structured, auto-sanitized context path. Keep the message static and attach the dynamic fields as context instead.
As per coding guidelines, "Use the logger with object context (auto-sanitized) for logging: `logger.debug('[Component] Message', { headers, body, config })`".🪵 Suggested logging shape
- logger.error(`Error saving result: ${error} ${safeJsonStringify(resultSummary)}`); + logger.error('[Evaluator] Error saving result', { + error, + resultSummary, + }); @@ - logger.error( - `Target returned HTTP ${httpStatus}. Aborting scan - this error will not resolve on retry.`, - ); + logger.error('[Evaluator] Target unavailable, aborting scan', { + httpStatus, + });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/evaluator.ts` around lines 1715 - 1734, The error logs are embedding dynamic details into the message string instead of using structured logger context; update the two logger.error calls so messages stay static and pass dynamic fields in the context object: for the catch around this.evalRecord.addResult(), keep a static message like "Error saving result" and attach { error, resultSummary: summarizeEvaluateResultForLogging(row) } (or the precomputed resultSummary) as the second argument to logger.error; for the HTTP-status branch, keep the static message "Target returned non-transient HTTP status. Aborting scan." and pass { httpStatus, rowMetadata: row.response?.metadata } (or similar relevant fields) as structured context; ensure references to this.evalRecord.addResult, summarizeEvaluateResultForLogging, logger.error, httpStatus, and isNonTransientHttpStatus are used to locate where to change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/evaluator.test.ts`:
- Around line 4000-4141: The tests "isolates manual provider cache entries by
repeat index" and "isolates beforeEach extension cache entries by repeat index"
mutate shared cache and only call clearCache() at start; add teardown cleanup by
calling clearCache() after each test (e.g., in a finally block at the end of
each it() or by adding an afterEach(async () => await clearCache()) at the top
of the spec) so that the cache is cleared regardless of test outcome; update the
tests referencing clearCache(), evaluate(), Eval.create(), and the mocked
runExtensionHook to ensure clearCache() runs after each test to prevent
cross-test leakage.
---
Nitpick comments:
In `@src/cache.ts`:
- Around line 391-392: The code awaits getCacheInstance() even though it returns
a Cache synchronously; remove the unnecessary await so that cache is assigned
directly from getCacheInstance() (update the line referencing getCacheInstance
and any nearby usage of cacheKey/getScopedCacheKey if needed). Ensure no other
call sites rely on it being a Promise and run tests to confirm behavior.
In `@src/evaluator.ts`:
- Around line 1649-1657: processEvalStep currently already enters the per-repeat
cache namespace via withCacheNamespace(getRepeatCacheNamespace(...)), so inside
that block you should call runEvalInternal(evalStep) instead of
runEval(evalStep) to avoid re-entering the same namespace; update the call in
the withCacheNamespace callback (replace runEval with runEvalInternal) and
ensure runEvalInternal is visible here (import/adjust export if necessary) so
the active namespace remains consistent for
evalStep.repeatIndex/evaluateOptions.
- Around line 1715-1734: The error logs are embedding dynamic details into the
message string instead of using structured logger context; update the two
logger.error calls so messages stay static and pass dynamic fields in the
context object: for the catch around this.evalRecord.addResult(), keep a static
message like "Error saving result" and attach { error, resultSummary:
summarizeEvaluateResultForLogging(row) } (or the precomputed resultSummary) as
the second argument to logger.error; for the HTTP-status branch, keep the static
message "Target returned non-transient HTTP status. Aborting scan." and pass {
httpStatus, rowMetadata: row.response?.metadata } (or similar relevant fields)
as structured context; ensure references to this.evalRecord.addResult,
summarizeEvaluateResultForLogging, logger.error, httpStatus, and
isNonTransientHttpStatus are used to locate where to change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e9430241-632d-4748-bba6-84198d593640
📒 Files selected for processing (10)
site/docs/configuration/caching.mdsite/docs/guides/evaluate-llm-temperature.mdsrc/cache.tssrc/commands/eval.tssrc/evaluator.tssrc/index.tstest/cache.test.tstest/commands/eval/evaluateOptions.test.tstest/evaluator.integration.transforms.test.tstest/evaluator.test.ts
| it('isolates manual provider cache entries by repeat index', async () => { | ||
| await clearCache(); | ||
|
|
||
| let cacheMissCount = 0; | ||
| const provider: ApiProvider = { | ||
| id: () => 'mock-provider', | ||
| callApi: vi | ||
| .fn() | ||
| .mockImplementation(async (_prompt: string, context?: Record<string, any>) => { | ||
| const cache = await context?.getCache(); | ||
| const cachedResponse = await cache?.get('manual-provider-key'); | ||
| if (cachedResponse) { | ||
| return { | ||
| ...(cachedResponse as ProviderResponse), | ||
| cached: true, | ||
| }; | ||
| } | ||
|
|
||
| cacheMissCount += 1; | ||
| const response = { | ||
| cached: false, | ||
| output: `result-repeat-${context?.repeatIndex}-miss-${cacheMissCount}`, | ||
| tokenUsage: createEmptyTokenUsage(), | ||
| }; | ||
| await cache?.set('manual-provider-key', response); | ||
| return response; | ||
| }), | ||
| }; | ||
|
|
||
| const testSuite: TestSuite = { | ||
| providers: [provider], | ||
| prompts: [toPrompt('Test prompt')], | ||
| tests: [{}], | ||
| }; | ||
|
|
||
| const firstEval = await Eval.create({}, testSuite.prompts, { id: randomUUID() }); | ||
| await evaluate(testSuite, firstEval, { maxConcurrency: 1, repeat: 2 }); | ||
| const firstSummary = await firstEval.toEvaluateSummary(); | ||
|
|
||
| expect(cacheMissCount).toBe(2); | ||
| expect(firstSummary.results.map((result) => result.response?.output)).toEqual([ | ||
| 'result-repeat-0-miss-1', | ||
| 'result-repeat-1-miss-2', | ||
| ]); | ||
| expect(firstSummary.results.map((result) => result.response?.cached)).toEqual([false, false]); | ||
|
|
||
| const secondEval = await Eval.create({}, testSuite.prompts, { id: randomUUID() }); | ||
| await evaluate(testSuite, secondEval, { maxConcurrency: 1, repeat: 2 }); | ||
| const secondSummary = await secondEval.toEvaluateSummary(); | ||
|
|
||
| expect(cacheMissCount).toBe(2); | ||
| expect(secondSummary.results.map((result) => result.response?.output)).toEqual([ | ||
| 'result-repeat-0-miss-1', | ||
| 'result-repeat-1-miss-2', | ||
| ]); | ||
| expect(secondSummary.results.map((result) => result.response?.cached)).toEqual([true, true]); | ||
| }); | ||
|
|
||
| it('isolates beforeEach extension cache entries by repeat index', async () => { | ||
| await clearCache(); | ||
|
|
||
| let extensionCacheMissCount = 0; | ||
| vi.mocked(runExtensionHook).mockImplementation(async (_extensions, hookName, context) => { | ||
| if (hookName !== 'beforeEach' || !('test' in context)) { | ||
| return context; | ||
| } | ||
|
|
||
| const hookContext = context as typeof context & { | ||
| test: { | ||
| vars?: Record<string, unknown>; | ||
| }; | ||
| }; | ||
|
|
||
| const cache = getCache(); | ||
| const cachedHookValue = await cache.get<string>('extension-hook-key'); | ||
| if (cachedHookValue) { | ||
| return { | ||
| ...hookContext, | ||
| test: { | ||
| ...hookContext.test, | ||
| vars: { | ||
| ...hookContext.test.vars, | ||
| hookValue: cachedHookValue, | ||
| }, | ||
| }, | ||
| }; | ||
| } | ||
|
|
||
| extensionCacheMissCount += 1; | ||
| const hookValue = `hook-repeat-${extensionCacheMissCount}`; | ||
| await cache.set('extension-hook-key', hookValue); | ||
|
|
||
| return { | ||
| ...hookContext, | ||
| test: { | ||
| ...hookContext.test, | ||
| vars: { | ||
| ...hookContext.test.vars, | ||
| hookValue, | ||
| }, | ||
| }, | ||
| }; | ||
| }); | ||
|
|
||
| const provider: ApiProvider = { | ||
| id: () => 'mock-provider', | ||
| callApi: vi | ||
| .fn() | ||
| .mockImplementation(async (_prompt: string, context?: Record<string, any>) => ({ | ||
| cached: false, | ||
| output: context?.vars?.hookValue, | ||
| tokenUsage: createEmptyTokenUsage(), | ||
| })), | ||
| }; | ||
|
|
||
| const testSuite: TestSuite = { | ||
| providers: [provider], | ||
| prompts: [toPrompt('Test prompt')], | ||
| tests: [{ vars: {} }], | ||
| extensions: ['file://hook.js'], | ||
| }; | ||
|
|
||
| const firstEval = await Eval.create({}, testSuite.prompts, { id: randomUUID() }); | ||
| await evaluate(testSuite, firstEval, { maxConcurrency: 1, repeat: 2 }); | ||
| const firstSummary = await firstEval.toEvaluateSummary(); | ||
|
|
||
| expect(extensionCacheMissCount).toBe(2); | ||
| expect(firstSummary.results.map((result) => result.response?.output)).toEqual([ | ||
| 'hook-repeat-1', | ||
| 'hook-repeat-2', | ||
| ]); | ||
|
|
||
| const secondEval = await Eval.create({}, testSuite.prompts, { id: randomUUID() }); | ||
| await evaluate(testSuite, secondEval, { maxConcurrency: 1, repeat: 2 }); | ||
| const secondSummary = await secondEval.toEvaluateSummary(); | ||
|
|
||
| expect(extensionCacheMissCount).toBe(2); | ||
| expect(secondSummary.results.map((result) => result.response?.output)).toEqual([ | ||
| 'hook-repeat-1', | ||
| 'hook-repeat-2', | ||
| ]); | ||
| }); |
There was a problem hiding this comment.
Clear cache after these tests to prevent cross-test leakage.
Both tests mutate shared cache state and only call clearCache() at setup time. Add teardown cleanup so later tests are not affected when execution order is randomized.
🧹 Suggested hardening
it('isolates manual provider cache entries by repeat index', async () => {
await clearCache();
+ try {
// ... existing test body ...
- });
+ } finally {
+ await clearCache();
+ }
+ });
it('isolates beforeEach extension cache entries by repeat index', async () => {
await clearCache();
+ try {
// ... existing test body ...
- });
+ } finally {
+ await clearCache();
+ }
+ });Based on learnings: Tests must be independent and can run in any order (configured to run in random order by default in vitest.config.ts).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/evaluator.test.ts` around lines 4000 - 4141, The tests "isolates manual
provider cache entries by repeat index" and "isolates beforeEach extension cache
entries by repeat index" mutate shared cache and only call clearCache() at
start; add teardown cleanup by calling clearCache() after each test (e.g., in a
finally block at the end of each it() or by adding an afterEach(async () =>
await clearCache()) at the top of the spec) so that the cache is cleared
regardless of test outcome; update the tests referencing clearCache(),
evaluate(), Eval.create(), and the mocked runExtensionHook to ensure
clearCache() runs after each test to prevent cross-test leakage.
…ntext - Clear `namespacedCacheInstances` map in `clearCache()` to prevent unbounded memory growth in long-running processes - Wrap `clearNamespacedCache` delete operations with try-catch that surfaces the namespace and key count in the error message Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…spaces-1431' into mdangelo/codex/repeat-cache-namespaces-1431
mldangelo
left a comment
There was a problem hiding this comment.
Elegant fix replacing the blunt 'disable cache for repeat > 1' with per-repeat-index cache namespaces via AsyncLocalStorage. The withCacheNamespace pattern is clean and composable, supporting nested scoping. Backwards compatible (repeat index 0 uses no namespace). All 184 tests pass (29 cache + 155 evaluator). Good docs additions explaining the behavior.
…-cache-namespaces-1431 # Conflicts: # src/evaluator.ts # test/evaluator.test.ts
…-cache-namespaces-1431
There was a problem hiding this comment.
💡 Codex Review
Lines 3012 to 3015 in 928e748
Wrap afterEach extension execution in the same repeat namespace used for beforeEach/runEvalInternal. processEvalStep uses withCacheNamespace(...), but processEvalRows calls runExtensionHook(..., 'afterEach', ...) outside that scope. Any getCache/fetchWithCache in an afterEach hook will share global keys across repeats, causing cross-repeat cache contamination.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
mldangelo
left a comment
There was a problem hiding this comment.
Thoroughly reviewed: namespace-based cache isolation for repeat indices is correct and well-tested. 226 tests pass, TSC clean, real evals verified with --repeat 2 (cold and warm runs). AsyncResource.bind correctly preserves ALS context through deferred grading queue. No bugs found.

Summary
repeat > 1and isolate cache entries by repeat indexQA
npm run tscnpm test(635 files passed, 15298 tests passed, 22 skipped)npx vitest run test/cache.test.ts test/evaluator.test.ts test/commands/eval/evaluateOptions.test.ts--repeat 2and--max-concurrency 2: cold run returned distinct uncached repeat rows; warm rerun replayed the same rows withcached=truemget/mset/mdel/ttl/clearand concurrent repeat executionCloses #1431