Fix concurrent TreeArtifact prefetches#29803
Conversation
Remote prefetches download into temporary files, but finalization and output-permission restoration can race across calls targeting the same TreeArtifact. One call can make the tree read-only while another is moving a child into it. Serialize finalization and permission restoration by resolved TreeArtifact root. Reopen every ancestor explicitly because ActionOutputDirectoryHelper may cache directory creation after output permissions are restored. Fixes bazelbuild#29801. This change was implemented with OpenAI Codex (GPT-5.5) and reviewed by me.
| } | ||
|
|
||
| @Test | ||
| public void prefetchFiles_treeFiles_concurrentFinalizations_doNotRacePermissions() |
There was a problem hiding this comment.
Could you also add a stress test that launches many concurrent operations, some of which overlap? That would give us additional confidence in the new synchronization scheme.
There was a problem hiding this comment.
Added in 8c5acce. The stress test starts 128 requests with 32 concurrent callers across four TreeArtifacts. Adjacent requests overlap one child, and the request pattern repeats to cover partially overlapping and fully deduplicated downloads.
|
|
||
| // TreeArtifact downloads use temporary paths. Serialize only permission changes and final file | ||
| // moves, keyed by resolved tree root, so unrelated trees can still finalize concurrently. | ||
| private final Striped<Lock> treeArtifactLocks = Striped.lock(64); |
There was a problem hiding this comment.
Is 64 a good choice here? We could also use weakly referenced locks that scale as high as needed.
There was a problem hiding this comment.
Done in 8c5acce. This now uses Striped.lazyWeakReadWriteLock(Integer.MAX_VALUE), so unused locks can be collected and distinct roots only contend on a hash collision rather than a fixed 64-stripe pool.
| // read-only while another call moves a child into the tree. | ||
| for (var entry : directoriesByTreeRoot.asMap().entrySet()) { | ||
| Lock lock = treeArtifactLocks.get(entry.getKey().asFragment()); | ||
| lock.lock(); |
There was a problem hiding this comment.
Would it make sense to make this a rw lock? Moves could happen concurrently (reads), only permission changes must be exclusive (write).
There was a problem hiding this comment.
Done in 8c5acce. Finalization holds a read lock from reopening ancestors through the move, while call-scoped permission restoration takes the write lock. Widening permissions is safe under read locks because it is monotonic during finalization and DirectoryTracker.compute serializes each directory transition. The deterministic test also verifies that one move completes while another is paused, but restoration cannot finish.
Allow child downloads beneath the same TreeArtifact to finalize concurrently while retaining exclusive permission restoration. Use weakly referenced locks across the full hash space to avoid fixed-stripe contention and unbounded lock retention. Add stress coverage for overlapping prefetch calls across multiple TreeArtifacts. This change was implemented with OpenAI Codex (GPT-5.5) and reviewed by me.
| } | ||
|
|
||
| @Test | ||
| public void prefetchFiles_treeFiles_manyConcurrentOverlappingCalls() throws Exception { |
There was a problem hiding this comment.
Does this test always fail without the fix or is it just flaky?
There was a problem hiding this comment.
With only the production file restored to d9f0fb4, this stress test failed 20/20 locally, each time because a tree remained writable. It is still schedule-dependent in principle: the latch starts callers together but does not force the failing interleaving, so unlike the two targeted tests I would not call it guaranteed.
| } | ||
|
|
||
| @Test | ||
| public void prefetchFiles_treeFiles_sequentialCalls_reopenAncestors() throws Exception { |
There was a problem hiding this comment.
Against d9f0fb4 with only the production file restored, the two targeted tests fail deterministically: prefetchFiles_treeFiles_sequentialCalls_reopenAncestors leaves the tree writable after the second prefetch, while prefetchFiles_treeFiles_concurrentFinalizations_doNotRacePermissions lets the first prefetch complete while the second move is paused. With the production fix restored, the full RemoteActionInputFetcherTest class passes.
| } | ||
|
|
||
| @Test | ||
| public void prefetchFiles_treeFiles_manyConcurrentOverlappingCalls() throws Exception { |
There was a problem hiding this comment.
With only the production file restored to d9f0fb4, this stress test failed 20/20 locally, each time because a tree remained writable. It is still schedule-dependent in principle: the latch starts callers together but does not force the failing interleaving, so unlike the two targeted tests I would not call it guaranteed.
Remote prefetches download into temporary files, but finalization and
output-permission restoration can race across calls targeting the same
TreeArtifact. One call can make the tree read-only while another is
moving a child into it.
Coordinate these operations by resolved TreeArtifact root. Allow child
finalizations to proceed concurrently, but restore permissions only
after all in-progress finalizations have completed. Weakly retain the
per-root locks so a long-lived prefetcher does not retain every tree it
touches.
Reopen every ancestor before moving a child because
ActionOutputDirectoryHelper caches directory creation after output
permissions are restored.
Fixes #29801.