Skip to content

test(e2e): reproduce py_binary execve failure under bazel run (runfiles discovery)#1113

Open
gregmagolan wants to merge 1 commit into
mainfrom
test/py-binary-bazel-run-runfiles-discovery
Open

test(e2e): reproduce py_binary execve failure under bazel run (runfiles discovery)#1113
gregmagolan wants to merge 1 commit into
mainfrom
test/py-binary-bazel-run-runfiles-discovery

Conversation

@gregmagolan

@gregmagolan gregmagolan commented Jun 16, 2026

Copy link
Copy Markdown
Member

Summary

A py_binary cannot be run via bazel run (or as a deployed/direct-exec binary) when RUNFILES_DIR is not already set: it aborts with

ERROR: execve failed with errno 2 (return code 1)

bazel test masks it (it sets RUNFILES_DIR), so existing e2e coverage doesn't catch it; bazel run and deployed binaries do.

Root cause (in the launcher, not this repo)

bazel run execs the binary with a relative argv[0] (bazel-bin/<name>), cwd set inside the runfiles tree (<x>.runfiles/_main), and no RUNFILES_DIR. The hermetic_launcher stub then computes <argv[0]>.runfiles relative to cwd — which points nowhere — so runfiles discovery fails, the embedded venv-python rlocation is left unresolved, and execve fails with errno 2.

Verified this reproduces on hermetic_launcher main (not just the pinned 0.0.9), and that bumping to the latest release 0.0.10 does not fix it. The launcher fix is here: hermeticbuild/hermetic-launcher#44 (resolve the real executable path via _NSGetExecutablePath / readlinkat(/proc/self/exe) when argv[0] is relative).

What changed in this PR

Adds an e2e regression case cases/run-runfiles-discovery:

  • a trivial py_binary,
  • an sh_test that unsets RUNFILES_DIR / RUNFILES_MANIFEST_FILE / JAVA_RUNFILES before invoking the binary, forcing the launcher's self-location path.

Tagged manual so it stays out of the default //... gate until the launcher fix is released and bumped here.

Remedy / follow-up

  1. Land + release fix(runfiles): resolve real exe path when argv[0] is relative (fixes bazel run) hermeticbuild/hermetic-launcher#44 (new hermetic_launcher version).
  2. Bump bazel_dep(name = "hermetic_launcher", ...) in MODULE.bazel to that version.
  3. Drop the manual tag on cases/run-runfiles-discovery so it gates bazel run.

(Bumping to 0.0.10 was tried and verified NOT to fix it, so no bump is included here yet — it would need the released fix from #44.)

Test plan

🤖 Generated with Claude Code

…es discovery)

Add a regression case showing a py_binary cannot be run via `bazel run` (or
direct exec) when RUNFILES_DIR is not pre-set: the launcher
(hermetic_launcher 0.0.9) fails to self-locate its `.runfiles` dir from argv[0]
and aborts with "execve failed with errno 2 (return code 1)".

`bazel test` masks the bug because it sets RUNFILES_DIR for the test process;
the case unsets the runfiles env vars before invoking the binary to force the
launcher's self-location path, matching what `bazel run` and a deployed binary
hit. Confirmed: passes with RUNFILES_DIR set, fails without.

Tagged `manual` so it stays out of the default `//...` gate until the launcher
fix lands.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@aspect-workflows

aspect-workflows Bot commented Jun 16, 2026

Copy link
Copy Markdown

✨ Aspect Workflows Tasks

📅 Tue Jun 16 08:39:18 UTC 2026

✅ 7 successful tasks

  • ✅ buildifier · ⏱ 19.4s · 🐙 GitHub Actions · ☑️ Check
    💬 Format complete (clean)
  • ✅ gazelle · ⏱ 22.2s · 🐙 GitHub Actions · ☑️ Check
    💬 Gazelle complete (clean)
  • ✅ test (test-e2e-bazel-8) · ⏱ 2m 10s · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel test complete (138/138 passed)
  • ✅ test (test-e2e-bazel-9) · ⏱ 3m 37s · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel test complete (133/133 passed)
  • ✅ test (test-examples-uv_pip_compile-bazel-8) · ⏱ 26s · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel test complete (1/1 passed · 1 cached)
  • ✅ test (test-root-bazel-8) · ⏱ 2m 8s · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel test complete (188/188 passed)
  • ✅ test (test-root-bazel-9) · ⏱ 2m 4s · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel test complete (187/187 passed)

⏱ Last updated Tue Jun 16 08:43:07 UTC 2026 · 📊 GitHub API quota 281/15,000 (2% used, resets in 1m)
🚀 Powered by Aspect CLI (v2026.24.11)  |  Aspect Build · X · LinkedIn · YouTube

gregmagolan added a commit to gregmagolan/hermetic-launcher that referenced this pull request Jun 16, 2026
…] is relative

When the launcher is exec'd with a relative argv[0] from a cwd unrelated to the
binary (notably `bazel run`, which sets cwd inside `<x>.runfiles/_main`, passes
argv[0]="bazel-bin/<name>", and does NOT set RUNFILES_DIR), `<argv[0]>.runfiles`
resolved against cwd points nowhere. Discovery then fails, the embedded program
rlocation is left unresolved, and execve aborts with errno 2 — observed
downstream as aspect-build/rules_py#1113 ("execve failed with errno 2") when
running a py_binary via `bazel run`.

Fix: when argv[0] is not absolute, resolve the real executable path and use it
for `<exe>.runfiles` discovery:
- macOS: _NSGetExecutablePath
- Linux (x86_64/aarch64): readlinkat(/proc/self/exe)
- other (s390x/Windows): unchanged (return None -> argv[0] fallback)

RUNFILES_DIR / RUNFILES_MANIFEST_FILE still take precedence when set.

Adds an integration case `relative_argv0_discovery` reproducing `bazel run`'s
geometry exactly (relative argv[0] via arg0 override + cwd inside runfiles + no
runfiles env), plus `symlinked_program_relative_argv0` for the venv-shaped
relative-symlink program target. Both fail before this change and pass after.

NB: the Linux readlinkat paths could not be run locally (macOS host); the macOS
path is verified end-to-end. s390x readlinkat is intentionally left unwired.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
malt3 added a commit to hermeticbuild/hermetic-launcher that referenced this pull request Jun 16, 2026
`bazel run` and directly-executed deployed binaries exec the launcher
with a relative argv[0] (e.g. `bazel-bin/foo`) from a working directory
inside the runfiles tree and without RUNFILES_DIR set, so computing
`<argv[0]>.runfiles` resolved against the wrong directory and runfiles
discovery failed with "Failed to initialize runfiles" / execve errno 2.
`bazel test` masked this by pre-setting RUNFILES_DIR.
See aspect-build/rules_py#1113.

Derive the executable path from a safe OS launch-path API instead of
argv[0], make it absolute (joining the cwd when relative), and do not
resolve symlinks:
  - Linux (all arches incl. s390x): AT_EXECFN from the ELF auxv
  - macOS: _NSGetExecutablePath
  - Windows: QueryFullProcessImageNameW

`program_path()` (argv[0]) is removed; `Runfiles::create` now takes
`&RuntimeArgs` and calls the new `executable_path()` seam, with a shared
`common::absolutize` + per-backend `current_dir` (getcwd).

Add an integration regression test (`run_runfiles_discovery`) that execs
the stub with a relative argv[0] from a cwd inside the runfiles tree with
the runfiles env vars unset; it fails before this change and passes after.
@malt3

malt3 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Can you test this again, but briefly add a git_override for the hermetic launcher? I just updated the digests of the prebuilt binaries on main to hopefully resolve this issue.
If this works correctly with a git_override, I'll cut a new release for you that you can upgrade to!

chpock added a commit to chpock/rules_py that referenced this pull request Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants