ci(ubuntu22): retry the apt.kitware.com CMake install on transient failure#5965
Closed
ci(ubuntu22): retry the apt.kitware.com CMake install on transient failure#5965
Conversation
…ilure apt.kitware.com has been unreliable in ways our image builds keep hitting. Most recent example: ubuntu22-arm64 image build on PR #5959 (run 24800811051) died at this step with Could not connect to apt.kitware.com:443 (66.194.253.25) - connect (111: Connection refused) ... E: Unable to locate package kitware-archive-keyring while the x64 leg in the same run succeeded at the exact same command. Dec 2024 had a full-day outage from a mis-issued SSL cert as well (https://discourse.cmake.org/t/kitware-apt-repo-down/13184). Keep using apt.kitware.com (it delivers a newer CMake than Ubuntu 22.04's apt, which MeshLib's MRCuda needs for CUDA20 dialect support against NVCC 12.6) but wrap the whole block in a retry loop: - 5 attempts total - Backoff between retries: 15s, 30s, 45s, 60s (total potential wait before giving up: ~2m30s) - Log each retry with the attempt number and the delay so the cause is visible in the image-build log - Fail hard after attempt 5 with a clear message All apt / apt-add-repository / apt-key / wget commands inside the block are idempotent: re-running the block from scratch after a partial failure is safe (apt remove cmake is a noop if cmake was already removed on the previous attempt, apt-add-repository skips adding a duplicate deb line, etc.). Only changes: - rm /etc/apt/trusted.gpg.d/kitware.gpg -> rm -f (in case a previous failed attempt already removed it) Otherwise the commands and their order are unchanged.
3 tasks
Contributor
Author
|
Closing as superseded. #5963 merged the stronger fix: drop the apt.kitware.com step entirely and fall back to |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Keep using
apt.kitware.comas the CMake source indocker/ubuntu22Dockerfile, but wrap the install block in a retry loop so a single connection-refused doesn't fail the whole image build.Why keep Kitware's repo rather than swap it
Ubuntu 22.04's apt-supplied CMake (3.22) is too old for MRCuda's
CUDA_STANDARD 20against NVCC 12.6 — CMake's table of NVCC compile-flags for that combination only started covering recent CUDA toolkits in 3.25+. We had a failing run on the "drop the upgrade entirely" experiment (#5963, closed):So we do need a newer CMake than Ubuntu jammy ships. Kitware's apt repo is the existing path, and this PR makes it durable against the transient outages we keep hitting.
Why retry
ubuntu22-arm64image build on MRZlib: route compress and decompress streams through zlib-ng (native mode) #5959 (run 24800811051) failed at this step withvtk.org, notapt.kitware.com).Connection-refused outages are typically minutes-long; a short retry loop covers them without requiring us to mirror the repo ourselves or swap providers.
The change
Retry shape
set -earound the install block means any single command failure aborts the subshell and triggers a retry — we don't continue past a failedapt updateonly to hit a cascading error later.--quiet-style silent suppression).Idempotency of the retried block
All commands inside the subshell are safe to re-run on a partial-failure retry:
apt remove cmake— noop if cmake was already removed on the previous attempt.apt-add-repository "deb ..."— idempotent; skips adding a duplicate source line.apt-key adv --recv-keys ...— idempotent; skips if key already in the keyring.apt install -y X— noop if X is already installed and up to date.rm /etc/apt/trusted.gpg.d/kitware.gpg→rm -f(only substantive change beyond the wrapping): after a failed attempt may have already removed the file, the next attempt'srmwithout-fwould abort the subshell withset -e.No other command changes; order and flags are identical to the current recipe.
Scope
Only
docker/ubuntu22Dockerfile.docker/ubuntu24Dockerfiledoesn't upgrade CMake (Ubuntu 24.04 noble's apt ships CMake 3.28 which does have the NVCC 12.6 × C++20 flag table entry) and has noapt.kitware.comrecipe to retry.Test plan
prepare-image / linux-image-build-upload (ubuntu22, x64)succeedsprepare-image / linux-image-build-upload (ubuntu22, arm64)succeeds — same recipe, same pathset -u/set -ecosts nothing🤖 Generated with Claude Code