Test harness must pass -B to guest Python until rename hangs are fixed
Summary
The Nanvix test harness (.nanvix/run-tests.py) currently passes -B to
the guest Python in both the standalone and hosted/direct branches.
This is a process-wide mitigation that must remain in place until the
two outstanding rename hangs — NSKIP021 (FAT VFS, standalone) and
NSKIP055 (linuxd RPC, hosted) — are both fixed upstream. This issue
tracks the harness-level requirement so it is not silently dropped in a
future harness rewrite, and so the conditions for removing it are
written down in one place.
Background
CPython's import system writes __pycache__/*.pyc files via an atomic
os.replace(tmp, dest) (see Lib/importlib/_bootstrap_external.py,
_write_atomic). This fires whenever a source's mtime is newer than its
cached pyc — including:
- any source edited after the install-time
compileall snapshot (i.e.
every developer iteration),
- any
/tmp/*.py synthesized by a test fixture and then imported (the
dominant pattern in test_importlib's metadata and finder/loader
suites),
- any module loaded during interpreter startup or
regrtest discovery
whose pyc is stale or missing.
On Nanvix both rename code paths currently hang:
- Standalone —
os.rename/os.replace through nanvix-kernel →
rust-fatfs → FAT image hangs the kernel for most probe shapes
(sibling rename in a subdir, replace with existing destination,
rename inside __pycache__, …). Tracked as NSKIP021.
- Hosted (single-process and multi-process) —
os.rename/
os.replace through nanvix-kernel → linuxd RPC → host ext4 hangs
the guest kernel for every probe shape we have tried (six
variants, including /tmp root sibling rename which standalone lets
through). Tracked as NSKIP055.
When the hang fires, the syscall never returns, the VM becomes
unresponsive, and the entire batch consumes its 600 s wall-clock
budget without producing results.
Why per-test NSKIP guards are not sufficient on their own
NSKIP021 and NSKIP055 are applied at test-body scope with
@unittest.skipIf(...). They cover tests whose bodies call
os.rename / os.replace / shutil.move / dbm / py_compile /
etc. They cannot cover the implicit pyc-write rename because:
- It happens at import time, before any test body runs. The
interpreter performs _write_atomic deep inside the import
machinery; a skipIf decorator on a TestCase method is never
reached if the import that pulls in the test module (or one of its
fixtures, or one of regrtest's own imports) has already locked up
the VM.
- The trigger surface is the entire enabled test list, not just
rename-using tests. Any stale-mtime source anywhere in the run
can fire it. After a single source edit, the install-time
compileall snapshot goes stale and the next run hangs on a module
that has nothing to do with NSKIP021/055.
- One stuck import sinks the whole batch. Tests are batched and
share a wall-clock budget; a single hung import wastes the budget
for every other module in that batch.
-B (equivalently PYTHONDONTWRITEBYTECODE=1) suppresses the
implicit pyc write at its source — the import system simply does not
call _write_atomic, so the rename hang is unreachable from that
path. Explicit os.rename/os.replace calls in test bodies are
unaffected and remain the responsibility of NSKIP021/NSKIP055
@skipIf guards. The two mitigations are orthogonal and both are
required.
Current state
Exit criteria
This issue can be closed and -B removed from the harness when both
of the following are true:
- NSKIP021 (FAT VFS rename hang on standalone) is resolved upstream and
the full six-shape rename probe passes on standalone.
- NSKIP055 (linuxd rename hang on hosted) is resolved upstream and the
full six-shape rename probe passes on hosted single-process and
multi-process.
Until then, any harness rewrite must preserve -B (or
PYTHONDONTWRITEBYTECODE=1) on every mode that runs guest Python.
Reproducing the implicit-rename hang (without -B)
# In a Nanvix guest, hosted or standalone, with bytecode writes enabled:
import os, tempfile, importlib.util, sys
d = tempfile.mkdtemp()
src = os.path.join(d, "m.py")
open(src, "w").write("x = 1\n")
sys.path.insert(0, d)
import m # first import: writes __pycache__/m.cpython-3XX.pyc
# via os.replace -> hangs the kernel.
Equivalent shapes: editing any installed Lib/... source so its mtime
exceeds its pyc, then importing it; running regrtest after a source
edit with bytecode writes enabled.
Related
- #501 — NSKIP021: FAT
VFS rename hang on standalone.
- #552 — NSKIP055:
linuxd rename hang on hosted.
- #371 — umbrella NSKIP
tracker.
- #551 — adds
-B to the
hosted/direct branch and documents the rationale inline in
.nanvix/run-tests.py.
Test harness must pass
-Bto guest Python until rename hangs are fixedSummary
The Nanvix test harness (
.nanvix/run-tests.py) currently passes-Btothe guest Python in both the standalone and hosted/direct branches.
This is a process-wide mitigation that must remain in place until the
two outstanding rename hangs — NSKIP021 (FAT VFS, standalone) and
NSKIP055 (linuxd RPC, hosted) — are both fixed upstream. This issue
tracks the harness-level requirement so it is not silently dropped in a
future harness rewrite, and so the conditions for removing it are
written down in one place.
Background
CPython's import system writes
__pycache__/*.pycfiles via an atomicos.replace(tmp, dest)(seeLib/importlib/_bootstrap_external.py,_write_atomic). This fires whenever a source's mtime is newer than itscached pyc — including:
compileallsnapshot (i.e.every developer iteration),
/tmp/*.pysynthesized by a test fixture and then imported (thedominant pattern in
test_importlib's metadata and finder/loadersuites),
regrtestdiscoverywhose pyc is stale or missing.
On Nanvix both rename code paths currently hang:
os.rename/os.replacethrough nanvix-kernel →rust-fatfs → FAT image hangs the kernel for most probe shapes
(sibling rename in a subdir, replace with existing destination,
rename inside
__pycache__, …). Tracked as NSKIP021.os.rename/os.replacethrough nanvix-kernel → linuxd RPC → host ext4 hangsthe guest kernel for every probe shape we have tried (six
variants, including
/tmproot sibling rename which standalone letsthrough). Tracked as NSKIP055.
When the hang fires, the syscall never returns, the VM becomes
unresponsive, and the entire batch consumes its 600 s wall-clock
budget without producing results.
Why per-test NSKIP guards are not sufficient on their own
NSKIP021 and NSKIP055 are applied at test-body scope with
@unittest.skipIf(...). They cover tests whose bodies callos.rename/os.replace/shutil.move/dbm/py_compile/etc. They cannot cover the implicit pyc-write rename because:
interpreter performs
_write_atomicdeep inside the importmachinery; a
skipIfdecorator on aTestCasemethod is neverreached if the import that pulls in the test module (or one of its
fixtures, or one of
regrtest's own imports) has already locked upthe VM.
rename-using tests. Any stale-mtime source anywhere in the run
can fire it. After a single source edit, the install-time
compileallsnapshot goes stale and the next run hangs on a modulethat has nothing to do with NSKIP021/055.
share a wall-clock budget; a single hung import wastes the budget
for every other module in that batch.
-B(equivalentlyPYTHONDONTWRITEBYTECODE=1) suppresses theimplicit pyc write at its source — the import system simply does not
call
_write_atomic, so the rename hang is unreachable from thatpath. Explicit
os.rename/os.replacecalls in test bodies areunaffected and remain the responsibility of NSKIP021/NSKIP055
@skipIfguards. The two mitigations are orthogonal and both arerequired.
Current state
-Band setsPYTHONDONTWRITEBYTECODE=1for the guest.-Bto the guest, with an inline comment explaining the rationale and
citing NSKIP055 (and NSKIP021 by analogy).
prior to PR tests: enable test_importlib subpackage (#491) #551 it was implicit in the standalone branch only.
Exit criteria
This issue can be closed and
-Bremoved from the harness when bothof the following are true:
the full six-shape rename probe passes on standalone.
full six-shape rename probe passes on hosted single-process and
multi-process.
Until then, any harness rewrite must preserve
-B(orPYTHONDONTWRITEBYTECODE=1) on every mode that runs guest Python.Reproducing the implicit-rename hang (without
-B)Equivalent shapes: editing any installed
Lib/...source so its mtimeexceeds its pyc, then importing it; running
regrtestafter a sourceedit with bytecode writes enabled.
Related
VFS rename hang on standalone.
linuxd rename hang on hosted.
tracker.
-Bto thehosted/direct branch and documents the rationale inline in
.nanvix/run-tests.py.