Skip to content

Test harness must pass -B to guest Python until rename hangs are fixed #559

@ada-x64

Description

@ada-x64

Test harness must pass -B to guest Python until rename hangs are fixed

Summary

The Nanvix test harness (.nanvix/run-tests.py) currently passes -B to
the guest Python in both the standalone and hosted/direct branches.
This is a process-wide mitigation that must remain in place until the
two outstanding rename hangs — NSKIP021 (FAT VFS, standalone) and
NSKIP055 (linuxd RPC, hosted) — are both fixed upstream. This issue
tracks the harness-level requirement so it is not silently dropped in a
future harness rewrite, and so the conditions for removing it are
written down in one place.

Background

CPython's import system writes __pycache__/*.pyc files via an atomic
os.replace(tmp, dest) (see Lib/importlib/_bootstrap_external.py,
_write_atomic). This fires whenever a source's mtime is newer than its
cached pyc — including:

  • any source edited after the install-time compileall snapshot (i.e.
    every developer iteration),
  • any /tmp/*.py synthesized by a test fixture and then imported (the
    dominant pattern in test_importlib's metadata and finder/loader
    suites),
  • any module loaded during interpreter startup or regrtest discovery
    whose pyc is stale or missing.

On Nanvix both rename code paths currently hang:

  • Standaloneos.rename/os.replace through nanvix-kernel →
    rust-fatfs → FAT image hangs the kernel for most probe shapes
    (sibling rename in a subdir, replace with existing destination,
    rename inside __pycache__, …). Tracked as NSKIP021.
  • Hosted (single-process and multi-process) — os.rename/
    os.replace through nanvix-kernel → linuxd RPC → host ext4 hangs
    the guest kernel for every probe shape we have tried (six
    variants, including /tmp root sibling rename which standalone lets
    through). Tracked as NSKIP055.

When the hang fires, the syscall never returns, the VM becomes
unresponsive, and the entire batch consumes its 600 s wall-clock
budget without producing results.

Why per-test NSKIP guards are not sufficient on their own

NSKIP021 and NSKIP055 are applied at test-body scope with
@unittest.skipIf(...). They cover tests whose bodies call
os.rename / os.replace / shutil.move / dbm / py_compile /
etc. They cannot cover the implicit pyc-write rename because:

  1. It happens at import time, before any test body runs. The
    interpreter performs _write_atomic deep inside the import
    machinery; a skipIf decorator on a TestCase method is never
    reached if the import that pulls in the test module (or one of its
    fixtures, or one of regrtest's own imports) has already locked up
    the VM.
  2. The trigger surface is the entire enabled test list, not just
    rename-using tests.
    Any stale-mtime source anywhere in the run
    can fire it. After a single source edit, the install-time
    compileall snapshot goes stale and the next run hangs on a module
    that has nothing to do with NSKIP021/055.
  3. One stuck import sinks the whole batch. Tests are batched and
    share a wall-clock budget; a single hung import wastes the budget
    for every other module in that batch.

-B (equivalently PYTHONDONTWRITEBYTECODE=1) suppresses the
implicit pyc write at its source — the import system simply does not
call _write_atomic, so the rename hang is unreachable from that
path. Explicit os.rename/os.replace calls in test bodies are
unaffected and remain the responsibility of NSKIP021/NSKIP055
@skipIf guards. The two mitigations are orthogonal and both are
required.

Current state

Exit criteria

This issue can be closed and -B removed from the harness when both
of the following are true:

  • NSKIP021 (FAT VFS rename hang on standalone) is resolved upstream and
    the full six-shape rename probe passes on standalone.
  • NSKIP055 (linuxd rename hang on hosted) is resolved upstream and the
    full six-shape rename probe passes on hosted single-process and
    multi-process.

Until then, any harness rewrite must preserve -B (or
PYTHONDONTWRITEBYTECODE=1) on every mode that runs guest Python.

Reproducing the implicit-rename hang (without -B)

# In a Nanvix guest, hosted or standalone, with bytecode writes enabled:
import os, tempfile, importlib.util, sys

d = tempfile.mkdtemp()
src = os.path.join(d, "m.py")
open(src, "w").write("x = 1\n")
sys.path.insert(0, d)

import m            # first import: writes __pycache__/m.cpython-3XX.pyc
                    # via os.replace -> hangs the kernel.

Equivalent shapes: editing any installed Lib/... source so its mtime
exceeds its pyc, then importing it; running regrtest after a source
edit with bytecode writes enabled.

Related

  • #501 — NSKIP021: FAT
    VFS rename hang on standalone.
  • #552 — NSKIP055:
    linuxd rename hang on hosted.
  • #371 — umbrella NSKIP
    tracker.
  • #551 — adds -B to the
    hosted/direct branch and documents the rationale inline in
    .nanvix/run-tests.py.

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions