Skip to content

Include CPU ISA hash in Warp kernel cache key#2452

Open
adenzler-nvidia wants to merge 2 commits intonewton-physics:mainfrom
adenzler-nvidia:adenzler/fix-warp-cache-cpu-isa
Open

Include CPU ISA hash in Warp kernel cache key#2452
adenzler-nvidia wants to merge 2 commits intonewton-physics:mainfrom
adenzler-nvidia:adenzler/fix-warp-cache-cpu-isa

Conversation

@adenzler-nvidia
Copy link
Copy Markdown
Member

Summary

  • Warp 1.13+ compiles CPU kernels with -march=native, which emits instructions
    specific to the compiling CPU. GitHub Actions runners vary in CPU model (Intel
    Ice Lake, AMD EPYC, etc.), so restoring a kernel cache built on one CPU onto a
    runner with a different ISA causes illegal-instruction crashes.
  • Add scripts/ci/cpu_isa_hash.py that detects the host CPU's ISA feature set
    and prints a stable 16-char hex hash. Include this hash in the CI cache key so
    kernels are only reused on runners with matching instruction sets.
  • Supports x86_64 (Linux/macOS/Windows via system C compiler), AArch64 Linux
    (via /proc/cpuinfo), and AArch64 macOS (via sysctl).

Context

After the Warp 1.13 dev nightly bump (#2427), CI started hitting Fatal Python error: Illegal instruction / 0xc000001d on both Windows and Ubuntu runners.
The root cause: Warp 1.13 added a cpu_compiler_flags option that defaults to
-march=native, causing its bundled LLVM to emit CPU-specific instructions.
When the GH Actions cache restores kernel objects compiled on e.g. an Intel Ice
Lake runner (with AVX-512) onto an AMD EPYC runner (without AVX-512), the
process crashes on the first kernel invocation.

The previous cache key (warp-kernels-OS-ARCH-<code-hash>) did not account for
CPU differences, and the restore-keys prefix fallback made cross-CPU cache
reuse likely.

Test plan

  • Verify the cpu-id step runs and prints a hash on all four matrix runners
    (ubuntu-latest, ubuntu-24.04-arm, windows-latest, macos-latest)
  • Verify the cache key in the logs includes the CPU hash
    (e.g. warp-kernels-Linux-X64-0723c9b174ec6c08-...)
  • Verify no more illegal-instruction crashes on subsequent runs

Warp 1.13+ compiles CPU kernels with -march=native, which emits
instructions specific to the compiling CPU.  GitHub Actions runners
vary in CPU model (Intel Ice Lake, AMD EPYC, etc.), so restoring a
kernel cache built on one CPU onto a runner with a different ISA
causes illegal-instruction crashes.

Add a lightweight Python script that detects the host CPU's ISA
feature set (via the system C compiler on x86, /proc/cpuinfo on ARM
Linux, sysctl on ARM macOS) and prints a stable hash.  Include this
hash in the cache key so kernels are only reused on runners with
matching instruction sets.
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@adenzler-nvidia
Copy link
Copy Markdown
Member Author

adenzler-nvidia commented Apr 15, 2026

Marking as draft — waiting on the upstream Warp fix that adds CPU ISA-aware module hashing and load-time feature validation directly in Warp. Once that ships, we should rework this PR to use Warp's own CPU feature detection for the cache key instead of rolling our own detection script.

@adenzler-nvidia
Copy link
Copy Markdown
Member Author

@shi-eric we won't need this anymore with your upstream changes, right?

@shi-eric
Copy link
Copy Markdown
Member

@shi-eric we won't need this anymore with your upstream changes, right?

In principle we shouldn’t. Can you test a few times by repeatedly triggering a pr that updates Warp?

Note that you can’t update to the bleeding edge nightly due to the issue I mentioned on Slack. Have to choose a nightly from a few days ago.

except FileNotFoundError:
pass

# macOS: sysctl exposes CPU features.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 machdep.cpu.features only exists on Intel Macs, so on Apple Silicon runners the sysctl -n machdep.cpu.features call below raises CalledProcessError, _aarch64_features() returns "", and main() falls back to platform.processor() (which is "arm" on Apple Silicon). On macos-latest the cache key ends up not being derived from ISA features at all, so that runner effectively keeps the pre-change behavior and one of the four CI targets does not benefit from this change.

The fallback itself is safe (no crash, stable hash), but the module docstring on line 15 advertises "AArch64 macOS (via sysctl)" support that is not actually working on Apple Silicon.

On Apple Silicon, sysctl hw.optional enumerates per-feature flags such as hw.optional.neon and hw.optional.armv8_2_sha3. Parsing and sorting those keys gives a real ISA fingerprint.

Example macOS ARM branch
# macOS Apple Silicon: hw.optional.* enumerates ISA features.
try:
    out = subprocess.check_output(
        ["sysctl", "-a"],
        text=True,
        stderr=subprocess.DEVNULL,
    )
    features = sorted(
        line.split(":", 1)[0].strip()
        for line in out.splitlines()
        if line.startswith("hw.optional.") and line.rstrip().endswith(": 1")
    )
    if features:
        return " ".join(features)
except (FileNotFoundError, subprocess.CalledProcessError):
    pass

Worth updating the docstring once the macOS ARM path actually contributes ISA features to the hash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants