Skip to content

Add copy fallback when hardlinking hits the filesystem link limit#237

Open
ysmaoui wants to merge 1 commit into
buildbarn:mainfrom
ysmaoui:copy-fallback-hardlink-limit
Open

Add copy fallback when hardlinking hits the filesystem link limit#237
ysmaoui wants to merge 1 commit into
buildbarn:mainfrom
ysmaoui:copy-fallback-hardlink-limit

Conversation

@ysmaoui

@ysmaoui ysmaoui commented Jun 23, 2026

Copy link
Copy Markdown

Fixes #236.

Problem

When bb_worker uses a native build directory, hardlinkingFileFetcher hardlinks cached CAS blobs from the local cache into each action's input root. Filesystems cap the number of hard links per file — NTFS allows 1023. A file shared by a large number of input roots (e.g. a common toolchain input on a high-fan-out build) exceeds the cap, and the hardlink fails with ERROR_TOO_MANY_LINKS on Windows (EMLINK on POSIX). This is wrapped as codes.Internal and the action aborts:

Failed to create hardlink to cached file "...":
An attempt was made to create more links on a file than the file system supports.

On Windows the virtual (WinFSP) build directory avoids hardlinks, but it needs the WinFSP kernel driver installed on the host — not possible on managed Kubernetes (e.g. AKS Windows node pools) — so native is the only option there and the link cap is a hard wall.

Change

When the hardlink fails because the cached file has reached the filesystem's maximum link count, fall back to copying the file into the input root instead of failing. The copy is an independent inode with its own link count, so materialization succeeds. Only the few over-limit hot blobs pay the copy cost; correctness is unchanged because the input root only needs the file's contents, not a shared inode.

The fix is contained to pkg/cas (no bb-storage change). isHardlinkLimitReached is platform-split: ERROR_TOO_MANY_LINKS on Windows, EMLINK elsewhere.

Testing

  • New unit test TestHardlinkingFileFetcherCopyFallbackOnLinkLimit exercises the fallback (via EMLINK); bazel test //pkg/cas:cas_test passes.
  • windows_amd64 cross-build of //cmd/bb_worker passes.
  • golint and the reformat/gofmt gate are clean.

When bb_worker uses a native build directory, cached files are hardlinked
from the cache into each action's input root. Filesystems cap the number of
hard links per file (NTFS allows 1023). A file that is shared by a large
number of input roots can exceed this limit, after which hardlinking fails
with ERROR_TOO_MANY_LINKS on Windows (or EMLINK on POSIX) and the action
fails.

When the link limit is reached, copy the cached file into the input root
instead of hardlinking it. The copy is an independent file with its own link
count, so materialization succeeds. Only the few heavily shared files that
exceed the limit pay the cost of a copy.
@ysmaoui ysmaoui force-pushed the copy-fallback-hardlink-limit branch from d349698 to 6557444 Compare June 23, 2026 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

native build directory: copy fallback when hardlink hits the filesystem link limit (NTFS 1023) on Windows

1 participant