Add copy fallback when hardlinking hits the filesystem link limit#237
Open
ysmaoui wants to merge 1 commit into
Open
Add copy fallback when hardlinking hits the filesystem link limit#237ysmaoui wants to merge 1 commit into
ysmaoui wants to merge 1 commit into
Conversation
When bb_worker uses a native build directory, cached files are hardlinked from the cache into each action's input root. Filesystems cap the number of hard links per file (NTFS allows 1023). A file that is shared by a large number of input roots can exceed this limit, after which hardlinking fails with ERROR_TOO_MANY_LINKS on Windows (or EMLINK on POSIX) and the action fails. When the link limit is reached, copy the cached file into the input root instead of hardlinking it. The copy is an independent file with its own link count, so materialization succeeds. Only the few heavily shared files that exceed the limit pay the cost of a copy.
d349698 to
6557444
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #236.
Problem
When
bb_workeruses anativebuild directory,hardlinkingFileFetcherhardlinks cached CAS blobs from the local cache into each action's input root. Filesystems cap the number of hard links per file — NTFS allows 1023. A file shared by a large number of input roots (e.g. a common toolchain input on a high-fan-out build) exceeds the cap, and the hardlink fails withERROR_TOO_MANY_LINKSon Windows (EMLINKon POSIX). This is wrapped ascodes.Internaland the action aborts:On Windows the
virtual(WinFSP) build directory avoids hardlinks, but it needs the WinFSP kernel driver installed on the host — not possible on managed Kubernetes (e.g. AKS Windows node pools) — sonativeis the only option there and the link cap is a hard wall.Change
When the hardlink fails because the cached file has reached the filesystem's maximum link count, fall back to copying the file into the input root instead of failing. The copy is an independent inode with its own link count, so materialization succeeds. Only the few over-limit hot blobs pay the copy cost; correctness is unchanged because the input root only needs the file's contents, not a shared inode.
The fix is contained to
pkg/cas(nobb-storagechange).isHardlinkLimitReachedis platform-split:ERROR_TOO_MANY_LINKSon Windows,EMLINKelsewhere.Testing
TestHardlinkingFileFetcherCopyFallbackOnLinkLimitexercises the fallback (viaEMLINK);bazel test //pkg/cas:cas_testpasses.windows_amd64cross-build of//cmd/bb_workerpasses.