Skip to content

ci(docker): retry vcpkg install on transient download failures#5985

Open
Fedr wants to merge 1 commit intomasterfrom
ci/vcpkg-install-retry
Open

ci(docker): retry vcpkg install on transient download failures#5985
Fedr wants to merge 1 commit intomasterfrom
ci/vcpkg-install-retry

Conversation

@Fedr
Copy link
Copy Markdown
Contributor

@Fedr Fedr commented Apr 25, 2026

Summary

Wrap ./vcpkg install in both docker/rockylinux8-vcpkgDockerfile and docker/rockylinux9-vcpkgDockerfile with a 3-attempt retry loop (30 s backoff). Catches transient HTTP 502s and other one-shot network blips from github.com's source-archive endpoint when vcpkg downloads port tarballs at image-build time.

Motivation

Run 24927294301 job 72999298143 on PR #5959 failed with:

error: curl operation failed with response code 502.
error: curl operation failed with response code 502.
error: Reached maximum number of attempts, won't retry download from
   https://github.com/malaterre/GDCM/archive/v3.2.2.tar.gz.
CMake Error at scripts/cmake/vcpkg_download_distfile.cmake:136 (message):
  Download failed, halting portfile.
error: building gdcm:x64-linux-meshlib failed with: BUILD_FAILED

That's a transient response from GitHub's auto-generated /archive/<ref>.tar.gz endpoint — github.com served two 502s in a row, vcpkg's hardcoded internal retry limit of 3 attempts was reached, and the entire image build failed. Same kind of one-shot CDN flake that #5966's wretry.action wrapper for gha-setup-ninja solved on the Windows side.

Why a workflow-level retry rather than vcpkg-level

vcpkg's vcpkg_download_distfile retry count is hardcoded at 3 inside the compiled vcpkg binary — there's no env var, no CLI flag, no cmake variable that exposes it. (Long-standing GitHub issue, no fix planned.)

A retry loop at the Dockerfile RUN level is the simplest and most effective workaround:

  • Cheap on retry. vcpkg's per-port build cache means that on a retry, only the failed-to-download port is actually re-fetched. Every port that succeeded in the previous attempt is reused from /root/vcpkg/installed/... immediately. We pay roughly the time of one download attempt per retry, not a full clean rebuild.
  • No new infrastructure. No mirror buckets to provision, no overlay ports to maintain, no asset-cache plumbing to wire up. (Asset-cache via X_VCPKG_ASSET_SOURCES is a more strategic fix but requires bucket setup and is a much larger change. This retry loop is the targeted fix.)
  • Symmetric with what we already do elsewhere. PR ci(windows): add DLL-not-found diagnostic steps #5966 wraps seanmiddleditch/gha-setup-ninja in Wandalen/wretry.action for the same reason — a download endpoint that's mostly fine but occasionally serves a 5xx.

Implementation

RUN source /opt/rh/gcc-toolset-11/enable && \
    source /opt/autoconf27/enable && \
    source /root/venv/bin/activate && \
    for attempt in 1 2 3 ; do \
        if ./vcpkg install --host-triplet=${VCPKG_TRIPLET} --overlay-triplets=MeshLib/triplets --overlay-ports=MeshLib/ports $(cat MeshLib/requirements.txt | tr '\n' ' ') ; then \
            break ; \
        fi ; \
        if [ "$attempt" = "3" ] ; then \
            echo "vcpkg install failed after 3 attempts" >&2 ; exit 1 ; \
        fi ; \
        echo "vcpkg install failed (attempt $attempt/3); retrying in 30s..." >&2 ; \
        sleep 30 ; \
    done

Same shape applied to rockylinux9-vcpkgDockerfile (different source lines for autoconf271, but the loop body and packages list expression are identical).

Scope

Two files, +28 / −2. No source code changes; no behavioural change on the happy path; new behaviour visible only when vcpkg install exits non-zero in any one of the first two attempts.

Future work

A separate follow-up should plumb the same retry loop into MeshInspectorCode's docker/rockylinux8-vcpkgDockerfile and docker/rockylinux9-vcpkgDockerfile, which have the same exposure. Out of scope here so this PR can land focused.

🤖 Generated with Claude Code

Wrap the vcpkg install RUN in both rockylinux8-vcpkg and
rockylinux9-vcpkg Dockerfiles with a 3-attempt for-loop and 30 s
backoff. Catches transient HTTP 502s (and similar one-shot network
blips) from github.com's source-archive endpoint when vcpkg downloads
port tarballs.

Concretely motivated by run 24927294301 / job 72999298143 on
feat/zlib-compress-stream-zlib-ng, which failed with:

    error: curl operation failed with response code 502.
    error: curl operation failed with response code 502.
    error: Reached maximum number of attempts, won't retry download
           from https://github.com/malaterre/GDCM/archive/v3.2.2.tar.gz
    CMake Error at scripts/cmake/vcpkg_download_distfile.cmake:136:
        Download failed, halting portfile.
    error: building gdcm:x64-linux-meshlib failed with: BUILD_FAILED

vcpkg's own download retry count is hardcoded at 3 inside the vcpkg
binary and is not exposed via env var or CLI flag. Wrapping at the
RUN-level retries the whole `./vcpkg install` invocation, which is
near-free because vcpkg's per-port build cache short-circuits any
port already installed in a previous attempt; only the failed-to-
download port is actually re-fetched on retry.

Same change applied to both rockylinux8 and rockylinux9 dockerfiles
so the two stay in lockstep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant