Fix flaky boxel-cli push --delete: guard #sourceCache against set-after-invalidate#5123
Merged
habdelra merged 1 commit intoJun 5, 2026
Merged
Conversation
getSourceOrRedirect's #sourceCache had no protection against a write landing after a concurrent invalidate, unlike #transpiledModuleCache. A source read that read bytes from disk and then had its path invalidated mid-flight — e.g. a DELETE removing the file while a worker's indexing fetch of the same source was still in flight — would re-populate the cache with the now-deleted bytes. A subsequent GET then served a 200 for a file already gone from disk, which is the flaky boxel-cli `push --delete` integration failure (the post-delete source GET returned a stale 200 cache hit instead of 404). Mirror the per-path + global generation guard #transpiledModuleCache already uses onto #sourceCache: snapshot the generation before the first await in getSourceOrRedirect and drop the set when the path's generation moved during the read; invalidate and the bulk clears bump the generation. Log every dropped stale set so a future recurrence through a path the snapshot didn't cover still leaves a signal. Add a deterministic regression test that parks a source read at the post-read/pre-set point via a test-only gate and races invalidate / delete / clear against it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes an intermittent stale-read race in the realm server’s source-serving path by hardening Realm.#sourceCache against “set-after-invalidate” when a source read is in-flight during a concurrent invalidation (notably DELETE). The change mirrors the existing generation-guard strategy already used for #transpiledModuleCache, preventing deleted or outdated bytes from being re-cached and subsequently served as a fast cache hit.
Changes:
- Add per-path and global generation counters for
#sourceCache, and drop cache writes when an invalidate/clear occurs during an in-flight read. - Route all single-path and bulk source-cache invalidations through generation-bumping helpers.
- Add deterministic realm-server regression tests using a test-only gate to park a request between “read bytes” and “cache set”, reproducing the race without timing flakiness.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| packages/runtime-common/realm.ts | Adds generation guards + helper invalidation methods for #sourceCache, and a test-only delay hook to deterministically reproduce the race window. |
| packages/realm-server/tests/module-cache-race-test.ts | Adds a new test module that deterministically verifies #sourceCache does not re-populate with stale bytes after concurrent invalidate/delete/clear. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Contributor
jurgenwerk
approved these changes
Jun 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The
boxel-clipush --deleteintegration test (removes remote-only files) fails intermittently in CI: after the delete-push returns204and the file is gone from disk, the test's immediate sourceGETfor the removed file returns200instead of404.Root cause
The realm server's
getSourceOrRedirectserves.gts/.jsonsource from an in-memory#sourceCache. It reads the bytes from disk under anawait(getFileWithFallbacks+materializeFileRef) and only then calls#sourceCache.set. IfinvalidateCache(path)fires inside that window it clears the slot synchronously — but the in-flight read'ssetre-populates it afterward with the now-stale bytes.#sourceCachehad no guard against this, even though#transpiledModuleCachewas already hardened against the identical set-after-invalidate race.The failing run's realm-server log shows the exact interleave: the prior push's
waitForIndex:falseindexing job for the same file was still in flight (its source fetch had read the bytes) when the delete-push'sDELETElanded. TheDELETEremoved the file and invalidated the cache, then the indexing fetch'ssetre-cached the just-deleted bytes — and the next sourceGETcame back a200cache hit (dur=2ms) for a file already gone from disk.Fix
Give
#sourceCachethe same per-path + global generation guard#transpiledModuleCachealready uses:getSourceOrRedirectsnapshots the source-cache generation before its firstawaitand drops itssetwhen the resolved path's generation moved during the read (keyed on the canonical path, so extensionless-alias requests are covered too).invalidateCache, the bypass-cache early invalidate, and the bulk clears bump the generation.The in-flight reader still serves the bytes it read at request time (consistent with its happens-before ordering); only the cache write is discarded.
Tests
A new deterministic module in
module-cache-race-test.tsuses a test-only gate that parks a source read at the post-read / pre-set point and races concurrent work against it:GETis a cache miss (set discarded)GETis404, not a stale200— the realm-server-level reproduction of the flake__testOnlyClearCachesduring the read → cache miss (the global generation catches what the path counter alone would miss)eslint + prettier clean and the type declarations build; the realm-server test shard exercises the new tests in CI.
🤖 Generated with Claude Code