Skip to content

apt: harden sandbox bootstrap against transient archive.ubuntu.com flakes#1284

Merged
rasdani merged 2 commits intomainfrom
daniel/apt-acquire-retries
May 9, 2026
Merged

apt: harden sandbox bootstrap against transient archive.ubuntu.com flakes#1284
rasdani merged 2 commits intomainfrom
daniel/apt-acquire-retries

Conversation

@rasdani
Copy link
Copy Markdown
Contributor

@rasdani rasdani commented May 4, 2026

Failure mode

archive.ubuntu.com / security.ubuntu.com resolve to a CDN whose edges propagate new InRelease manifests and Packages.gz files asynchronously when Canonical pushes new package indexes. A fresh sandbox can fetch InRelease from one already-synced edge and Packages.gz from another not-yet-synced edge, and apt aborts:

E: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/noble-updates/main/binary-amd64/Packages.gz
File has unexpected size (2399568 != 2399874). Mirror sync in progress? [IP: 91.189.91.81 80]

apt's default Acquire::Retries is 0 — one bad fetch fails the rollout. With BS=256 × R=8 sandboxes hitting the CDN simultaneously, we reliably roll the dice during Canonical sync windows. This took out a recent RL run mid-rollout.

Upstream tracking: Launchpad #1876035. Standard mitigation in CI guides, Docker/CircleCI/moby/vscode is apt-get -o Acquire::Retries=3.

What this changes

Add -o Acquire::Retries=3 to every apt-get update / apt-get install invoked in sandbox bootstrap / image-setup paths (rollout hot path). The flag is also applied to follow-up apt-get install calls that race-fetch debs.

Files touched

  • verifiers/envs/experimental/composable/harnesses/opencode.py — opencode harness install script (the canonical SWE rollout install path; this is the file that broke the recent run).
  • verifiers/envs/experimental/composable/harnesses/mini_swe_agent.py — mini-SWE-agent install script.
  • verifiers/envs/experimental/composable/tasksets/swe/multi_swe.py — multi-SWE per-rollout apt-get install patch.
  • verifiers/envs/experimental/opencode_env.pyDEFAULT_RUN_COMMAND_TEMPLATE (per-rollout sandbox bootstrap).
  • verifiers/envs/experimental/opencode_rlm_env.pyRLM_RUN_COMMAND_TEMPLATE (per-rollout sandbox bootstrap).
  • environments/terminus_harbor/terminus_harbor.pypost_sandbox_setup apt call.
  • environments/hello_mcp_harbor/hello_mcp_harbor.py — sandbox run command.
  • environments/opencode_harbor/opencode_harbor.py — sandbox run command.
  • assets/templates/browserbase/cua/setup.sh, setup-binary.sh, Dockerfile.runtime — CUA browserbase sandbox setup templates.
  • environments/openenv_echo/proj/server/Dockerfile, environments/openenv_textarena/proj/server/Dockerfile — env server image builds.
  • tests/test_opencode_rlm_env.py — updated assertion to match new install string.

Skipped (intentionally)

  • environments/*/tasks/*/tests/test.sh, environments/*/tasks/*/solution/solve.sh — task-level evaluation/solution scripts. apt failures here should be loud; they don't gate training.
  • scripts/install.sh — host dev/setup script for human use.
  • verifiers/envs/experimental/composable/tasksets/swe/swe_lego.py — only contains a comment string mentioning apt-get update, not an actual call.

Validation

  • uv run ruff check . clean.
  • uv run ruff format --check clean on all touched files.
  • uv run pre-commit run --files <touched> passes.
  • uv run pytest tests/test_opencode_rlm_env.py tests/test_rlm_composable_env.py tests/test_envs.py tests/test_opencode_harbor.py tests/test_build_script.py — all apt-related tests pass; the 2 failures in test_envs.py are pre-existing OPENAI_API_KEY-not-set smoke tests unrelated to this change.

🤖 Generated with Claude Code


Note

Low Risk
Low risk change that only hardens apt-get update/install invocations with retries, but it touches many sandbox/image bootstrap paths so any typo could impact environment startup.

Overview
Reduces rollout flakiness from transient archive.ubuntu.com mirror/CDN sync issues by adding apt-get -o Acquire::Retries=3 to sandbox bootstrap apt-get update/install steps across harness install scripts, environment setup commands, and Docker image builds.

Updates related tests and assertions to match the new apt-get command strings, ensuring v1 harness programs (OpenCode, Pi, MiniSWEAgent, RLM) and experimental env templates expect the hardened setup behavior.

Reviewed by Cursor Bugbot for commit cef1182. Bugbot is set up for automated code reviews on this repo. Configure here.

…akes

archive.ubuntu.com / security.ubuntu.com resolve to a CDN whose edges
propagate new InRelease manifests and Packages.gz files asynchronously
during Canonical's index pushes. A fresh sandbox can fetch InRelease
from one already-synced edge and Packages.gz from another not-yet-synced
edge, producing errors like:

    E: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/noble-updates/main/binary-amd64/Packages.gz
    File has unexpected size (2399568 != 2399874). Mirror sync in progress?

apt's default Acquire::Retries=0, so a single bad fetch fails the
rollout. With BS=256 x R=8 hitting the CDN simultaneously, this is
hit reliably during Canonical sync windows.

Add Acquire::Retries=3 to apt-get update and apt-get install calls in
sandbox bootstrap / image-setup paths (rollout hot path), plus the
matching install calls that race-fetch debs. Task-level test scripts
and one-shot solve.sh helpers are intentionally left unchanged: those
are evaluation artifacts where a loud apt failure is the correct
behavior.

See launchpad bug #1876035 and ubuntu discourse threads for the
upstream issue.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rasdani rasdani requested a review from willccbb May 7, 2026 20:14
@rasdani
Copy link
Copy Markdown
Contributor Author

rasdani commented May 9, 2026

@codex resolve the mergr conflict

@chatgpt-codex-connector
Copy link
Copy Markdown

I checked the branch tied to the PR head SHA (5c4c63196eaceff328254c46605a8f663db4b871) and there are no active merge conflicts to resolve right now.

What I ran

  • git status --short --branch (clean branch, no conflicted files)
  • rg -n "<<<<<<<|=======|>>>>>>>" (no real conflict markers found; only separator comments)
  • git merge 877e58296aa00214361460a8760d8a0d942d729e (reports “Already up to date.”)

Because there were no conflicts and no required code edits, I did not create a commit and therefore did not open a follow-up PR.

View task →

@rasdani
Copy link
Copy Markdown
Contributor Author

rasdani commented May 9, 2026

@cursoragent resolve merge conflicts

@cursor
Copy link
Copy Markdown

cursor Bot commented May 9, 2026

Resolved the merge conflict and pushed the branch.

Summary:

  • Merged origin/main into daniel/apt-acquire-retries.
  • Resolved conflict in environments/opencode_harbor/opencode_harbor.py by keeping the new v1 vf.OpenCode/vf.HarborTaskset refactor.
  • Preserved the PR’s apt retry behavior in the new v1 bootstrap paths:
    • verifiers/v1/packages/harnesses/opencode.py
    • mini_swe_agent.py
    • pi.py
    • rlm.py
    • verifiers/v1/utils/sandbox_utils.py
  • Added/updated assertions for the v1 harness setup strings.
  • Committed and pushed: cef11828 Resolve main merge conflicts
  • Updated the existing PR branch metadata.

Validation:

  • python3 -m py_compile ... on touched Python files passed.
  • Full uv/pytest/ruff/pre-commit validation could not run because this environment lacks uv, pytest, ruff, and pre-commit.
  • GitHub PR state now reports UNSTABLE instead of DIRTY, indicating the merge conflict is resolved and remaining status is check-related.
Open in Web Open in Cursor 

Co-authored-by: rasdani <rasdani@users.noreply.github.com>
@rasdani rasdani merged commit 291fd4a into main May 9, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants