feat: add nvidia-tegra-nvgpu package for Jetson Orin (GA10B) by schwankner · Pull Request #1518 · siderolabs/pkgs

schwankner · 2026-04-22T08:44:39Z

Summary

Adds the OE4T-patched GPU driver stack for NVIDIA Jetson Orin NX (Tegra234 / GA10B) as a Talos kernel package.

Hardware: Jetson Orin NX 16GB (Tegra234 / GA10B)
Tested with: Talos v1.13, kernel 6.18, CUDA 12.6

Modules

Module	Source	Purpose
`host1x.ko`	OE4T linux-nv-oot (wip-r36.5-take-2)	Syncpoint API with GA10B support + syncpt[0] reservation fix
`host1x-fence.ko`	OE4T linux-nv-oot	DMA fence bridge for syncpoints
`nvmap.ko`	OE4T linux-nv-oot	GPU memory allocator
`mc-utils.ko`	OE4T linux-nv-oot	EMC frequency management
`nvhost-ctrl-shim.ko`	embedded `.c` (this pkg)	`/dev/nvhost-ctrl` for JetPack 6 CUDA runtime
`nvgpu.ko`	OE4T linux-nvgpu	Main GA10B GPU driver

nvhost-ctrl-shim

The JetPack 6 CUDA runtime (libnvrm_host1x.so) requires /dev/nvhost-ctrl for cudaStreamSynchronize. Without it, CUDA falls back to CPU semaphore polling (~7 tok/s vs ~60 tok/s for LLM inference on Orin NX 16GB).

The shim bridges 8 ioctls (GET_VERSION, SYNCPT_READ, SYNCPT_READ_MAX, SYNCPT_WAITMEX, SYNC_FENCE_CREATE, GET_CHARACTERISTICS, POLL_FD_CREATE, SYNC_FILE_EXTRACT) to the OE4T host1x syncpoint API using dma_fence for interrupt-driven waits.

The source (nvhost_ctrl_shim.c) is embedded directly in the package directory and made available at /pkg/ by bldr at build time.

host1x shadow path

OE4T host1x.ko is installed at kernel/drivers/gpu/host1x/ to shadow the in-tree module. The squashfs overlay replaces the built-in host1x so nvgpu's CONFIG_TEGRA_GK20A_NVHOST=y path gets the OE4T version with HOST1X_SYNCPT_GPU support and the GA10B ERRATA_SYNCPT_INVALID_ID_0 fix.

Build notes

Built with Clang (LLVM=1), matching the Talos toolchain
OE4T linux-nv-oot branch wip-r36.5-take-2 adds kernel 6.18 compatibility fixes (__assign_str, f_ref, __alloc_pages_bulk 5-arg)
Module signing uses the kernel-build stage's auto-generated certs/signing_key.pem
CONFIG_GCC_PLUGIN_LATENT_ENTROPY stripped from auto.conf / autoconf.h before OOT builds (Clang compat)
CI-validated with ARM64 build (~90 min cold / ~20 min warm kernel cache), all 6 .ko files confirmed present

Result

Full CUDA inference in Kubernetes pods on Talos Linux without privileged: true, using CDI device injection. Verified ~60 tok/s on qwen2.5:0.5b (Ollama) on Jetson Orin NX 16GB.

References

Continues: (WIP)feat: add l4t package #1166 (WIP l4t pkg — established that modules need to be a pkg first)
Discussion: feat: add support for Jetson Orin Nano and AGX Orin SBCs sbc-jetson#23 (frezbo/smira confirmed shim belongs in pkgs)
Extension PR (follow-up, once this merges): (WIP)feat: add extension for jetson orin SBC extensions#624

Adds OE4T-patched GPU driver stack for NVIDIA Jetson Orin NX (Tegra234 / GA10B): - OE4T host1x + host1x-fence: GA10B syncpoint support with ERRATA_SYNCPT_INVALID_ID_0 fix - nvmap, mc-utils, governor_pod_scaling: standard Tegra support modules - nvhost-ctrl-shim: /dev/nvhost-ctrl userspace interface for JetPack 6 CUDA runtime - nvgpu: main GA10B GPU driver (OE4T patches, Clang build, kernel 6.18 compat) The nvhost-ctrl-shim provides hardware syncpoint interrupt support for cudaStreamSynchronize via NVHOST_IOCTL_CTRL_SYNC_FENCE_CREATE + SYNC_FILE_EXTRACT, enabling full CUDA throughput instead of CPU semaphore polling. Built with Clang (LLVM=1), requires OE4T linux-nv-oot (wip-r36.5-take-2) for kernel 6.18 compatibility. CONFIG_TEGRA_GK20A_NVHOST=y uses OE4T host1x with HOST1X_SYNCPT_GPU support. Tested: ~60 tok/s qwen2.5:0.5b on Jetson Orin NX 16GB with Talos Linux v1.13. Continues: siderolabs#1166 Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>

Per review feedback: extract the OOT module Makefile patching block from pkg.yaml inline shell into nvidia-tegra-nvgpu/scripts/fixup.sh. Bldr makes the package directory available at /pkg/ during build, so the script is invoked as /pkg/scripts/fixup.sh. Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>

…ld error - Move base64-encoded nvhost_host1x.c patch to scripts/patch_nvhost_host1x.py for readability (per review feedback) - governor_pod_scaling build failures now exit 1 instead of continuing silently (per review feedback) Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>

frezbo · 2026-04-22T10:52:00Z

@claude review

Add renovate datasource annotations for the three OE4T git repositories so dependency updates can be tracked automatically: - OE4T/linux-nvgpu (GA10B GPU driver) - OE4T/linux-nv-oot (NVIDIA out-of-tree modules) - OE4T/linux-hwpm (hardware performance monitor) Note: sha256/sha512 checksums must be updated manually alongside the commit hash when renovate proposes an update. Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>

Copilot

Pull request overview

Adds a new Talos kernel package (nvidia-tegra-nvgpu) to build and ship the OE4T-patched NVIDIA Tegra (Jetson Orin / GA10B) GPU driver stack, including a nvhost-ctrl shim module needed for JetPack 6 CUDA runtime behavior.

Changes:

Introduces the nvidia-tegra-nvgpu package build/install pipeline for multiple OE4T OOT modules + nvgpu.ko, including patching steps for kernel 6.18 compatibility.
Adds an in-tree nvhost_ctrl_shim.c module implementing /dev/nvhost-ctrl and bridging NVHOST ioctls to host1x syncpoints/fences.
Adds OE4T source pins (commit + hashes) to Pkgfile to fetch/build the required upstream driver sources reproducibly.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`nvidia-tegra-nvgpu/pkg.yaml`	New package recipe: fetch OE4T sources, apply fixups/patches, build modules with clang wrapper, install + sign modules, add modprobe softdeps.
`nvidia-tegra-nvgpu/nvhost_ctrl_shim.c`	New kernel module implementing `/dev/nvhost-ctrl` and required ioctls for CUDA runtime syncpoint waits and fence interop.
`nvidia-tegra-nvgpu/scripts/fixup.sh`	Build-time patch script to adjust OE4T OOT module Makefiles and force kernel-compat conftest paths for 6.18.
`nvidia-tegra-nvgpu/scripts/patch_nvhost_host1x.py`	Build-time source patch to add a retry loop in `nvhost_host1x.c` for GA10B syncpt allocation failures.
`Pkgfile`	Adds pinned OE4T source revisions and checksums used by the new package.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…SIG_ALL=y Replace manual sign-file loop with standard KBUILD modules_install for each OOT module directory. Signing and debug-info stripping are handled natively by the kernel build system (same pattern as gasket-driver, zfs, hailort, etc.). host1x uses INSTALL_MOD_DIR=kernel/drivers/gpu/host1x to shadow the in-tree module; all other modules install to INSTALL_MOD_DIR=extra/nvidia-tegra. Add test: section with module-signature and fhs-validator checks. Fix finalize to: /rootfs → to: / (matches pkgs convention). Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>

nvhost_ctrl_shim.c: - get_host1x(): call put_device(&pdev->dev) after platform_get_drvdata() to release the reference acquired by of_find_device_by_node(); fixes a device reference leak on every /dev/nvhost-ctrl open - nvhost_ctrl_devnode(): guard mode pointer before dereferencing (mode may be NULL per devnode callback contract); add comment explaining 0666 choice pkg.yaml: - remove duplicate clang-oot wrapper creation from the nvgpu build block; the wrapper is installed once in the preceding OOT build step and persists on disk across shell blocks Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>

Copilot

Pull request overview

Adds a new Talos kernel module package providing the OE4T-patched NVIDIA Tegra GA10B GPU driver stack (Jetson Orin NX / Tegra234), including a /dev/nvhost-ctrl shim required by the JetPack 6 CUDA runtime.

Changes:

Introduces a new nvidia-tegra-nvgpu package that builds/installs OE4T host1x, host1x-fence, nvmap, mc-utils, devfreq governor, nvgpu, plus an in-package nvhost-ctrl ioctl shim module.
Adds build-time fixup/patch scripts to force kernel-6.18 conftest paths and apply GA10B syncpoint allocation workarounds.
Adds OE4T source pin variables (commit + checksums) to Pkgfile.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
nvidia-tegra-nvgpu/pkg.yaml	New package build/install/test pipeline for OE4T modules + nvhost-ctrl shim; applies multiple source patches.
nvidia-tegra-nvgpu/nvhost_ctrl_shim.c	New kernel module implementing `/dev/nvhost-ctrl` NVHOST ioctls backed by host1x syncpoints/dma_fence.
nvidia-tegra-nvgpu/scripts/fixup.sh	Build-time Makefile/source patching for OE4T OOT modules to compile against kernel 6.18 with Clang.
nvidia-tegra-nvgpu/scripts/patch_nvhost_host1x.py	Build-time Python patcher adding a retry loop to `nvhost_host1x.c` syncpt allocation.
Pkgfile	Adds pinned OE4T commits and checksums for linux-nvgpu, linux-nv-oot, and linux-hwpm.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The awk command that inserts the ERRATA_SYNCPT_INVALID_ID_0 id=0 skip block had no failure check: if the upstream file changes and the pattern is not matched, awk silently writes an unpatched file. Add an explicit grep post-check for the inserted marker; exit 1 if the marker is absent so the build fails loudly instead of shipping a silently broken nvgpu. Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Alexander Schwankner <1496765+schwankner@users.noreply.github.com>

fixup.sh already strips CONFIG_GCC_PLUGIN_LATENT_ENTROPY from /src/include/config/auto.conf and /src/include/generated/autoconf.h in an earlier build step. The identical sed commands in the nvgpu block are a no-op; remove them to keep the fix single-sourced. Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>

CONFTEST_OUT was defined but never referenced in the script. The conftest path is injected via the srctree.nvconftest make variable, not through this shell variable. Remove it to comply with the set -euo pipefail hygiene. Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>

All kernel-dependent packages use the -pkg suffix in their pkg.yaml name field (gasket-driver-pkg, hailort-pkg, zfs-pkg, etc.) and are listed in .kres.yaml so bldr picks them up in CI. Rename nvidia-tegra-nvgpu to nvidia-tegra-nvgpu-pkg and add the entry to .kres.yaml between nvidia-open-gpu-kernel-modules-production-pkg and px-fuse-pkg. Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>

All other kernel module packages (gasket-driver, zfs, hailort, xdma-driver, kmod-nvidia) copy modules.order, modules.builtin, and modules.builtin.modinfo from /src into the rootfs before running modules_install. Add the same block so downstream tooling has the expected kernel module metadata for the target release. Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>

frezbo · 2026-04-23T16:59:35Z

@@ -0,0 +1,126 @@
+#!/bin/bash


do you think OOT accepts these as patches upstream?

Yes, ideally this should be handled on their side, rather than patching files that have been generated by from Kconfig.

The kref_init(&syncpt[0].ref) fix in fixup.sh is already present in newer OE4T commits (e.g., 6e071c0 in linux-nv-oot) — bumping the oe4t_nv_oot_commit pin would make that fixup redundant.

The nvhost_host1x.c retry loop addresses a real hardware behavior (NVGPU_ERRATA_SYNCPT_INVALID_ID_0, defined in the nvgpu errata table). I haven't submitted it upstream yet.

The nvhost-ctrl-shim module would require NVIDIA or OE4T to accept a /dev/nvhost-ctrl ABI compatibility layer for mainline-based kernels.

Agreed in principle. The CONFIG_GCC_PLUGIN_LATENT_ENTROPY removal is needed because the Talos kernel config (built for GCC) includes this option, but OOT modules are compiled with Clang which does not implement the GCC plugin — causing a compile error via linux/random.h. The correct fix is upstream: either in OE4T's build system or in Talos's kernel config for Clang builds. I'll track this as a follow-up.

Talos kernel is built with Clang now, and therefore modules are also. But perhaps autoconfiguration scripts in OE4T do not have proper support yet, right. Patching them to honor LLVM=1 (or passing the right parameters, maybe CC) would be a more robust solution

Agreed. The two categories in fixup.sh have different upstream paths:

CONFIG_GCC_PLUGIN_LATENT_ENTROPY strip — this is a Talos kernel config artifact: CONFIG_GCC_PLUGIN_LATENT_ENTROPY=y was set when config-arm64 was generated with GCC. Since Talos now builds with Clang, removing it from config-arm64 is the correct fix. I can open a separate PR in this repo for that.

Conftest macro overrides (NV_IOMMU_MAP_HAS_GFP_ARG, NV_MM_STRUCT_..., etc.) — these compensate for OE4T's conftest infrastructure not yet probing kernel 6.18 APIs correctly when building with Clang/LLVM=1. The proper fix lives in OE4T/linux-nv-oot: the conftest scripts need kernel 6.18 API detections. I'll track this in an OE4T meta-tegra issue (since linux-nv-oot has issues disabled). Once upstream conftest detections are fixed, these overrides can be removed.

dsseng · 2026-04-23T17:30:10Z

Where do these revisions come from? They do not seem to be tags in the repos, so are they perhaps referenced by meta-tegra or whatever?

Please let us know how to recognize proper, released versions of these drivers and not WIP snapshots which may be unstable

Valid concern. The three commits come from:

oe4t_nvgpu_commit (d530a48): OE4T linux-nvgpu branch patches-rel-36 — the project's default stable branch for the r36.x series

oe4t_nv_oot_commit (ccf7646): OE4T linux-nv-oot branch wip-r36.5-take-2 — admittedly a WIP branch. A more stable alternative is patches-r36.5 (last updated 2026-03-05 with kernel 6.18 build fixes). I can switch to that.

oe4t_hwpm_commit (4d8a699): OE4T linux-hwpm default branch

None have release tags — that is the current state of OE4T for GA10B on kernel 6.x.

dsseng · 2026-04-23T17:42:04Z

If this module is used by CUDA libraries, why is this feature not supported in the driver itself? Do you plan to request reviews and upstream it?

/dev/nvhost-ctrl is part of NVIDIA's traditional downstream Tegra kernel ABI (nvidia-t23x-kernel). On a standard JetPack Linux install, this device is provided by NVIDIA's downstream kernel. The OOT nvgpu driver provides GPU compute on mainline kernels but does not include the nvhost control interface.

On Talos Linux (mainline kernel + OOT modules), libnvrm_host1x.so cannot find /dev/nvhost-ctrl and falls back to CPU semaphore polling — which reduces LLM inference throughput from ~60 tok/s to ~7 tok/s on an Orin NX 16GB.

The long-term fix would be for NVIDIA to include nvhost-ctrl in the OOT driver or to update libnvrm_host1x.so to use a different interface. Until then, this shim is the only known way to enable hardware-interrupt-driven sync on mainline kernels.

Thank you for the explanation. I'm not very familiar with the ecosystem, but perhaps this solution is acceptable (with corresponding extension tier for a downstream driver).

Some public interaction with OE4T would be nice to get this driver reviewed by professionals of this platform, however

Good point. I'll open an issue in OE4T/meta-tegra to start the conversation — the key question for them is whether /dev/nvhost-ctrl should be integrated into the OOT nvgpu package for mainline deployments, or whether there's a different intended path for JetPack 6 CUDA on non-JetPack kernels. Will link the issue here once open.

OE4T issue opened: OE4T/meta-tegra#2196

Since we're early in the 1.14 works, I would wait for a while and see what the OE4T folks have to say on this issue, let's wait a week or two before proceeding, does that sound good? Meanwhile we can have draft PR's ready for extensions/overlays

Hi @frezbo, I got the OE4T feedback from the disscussion.

Short summary: The maintainers position is that /dev/nvhost-ctrl was an older interface and they can't give a definitive answer on whether the shim approach is correct — since it involves NVIDIA's closed-source libraries. They recommended asking the NVIDIA developer forum.

While waiting, I did some deeper analysis on my running system. Key finding: the CUDA process holds both /dev/nvgpu/igpu0/ctrl (channel allocation) and /dev/nvhost-ctrl (syncpoint wait) open simultaneously during inference, not as a fallback sequence. And no module in the OOT stack creates the global /dev/nvhost-ctrl device. host1x-nvhost.ko only creates per-engine devices like nvhost-ctrl-host1x. So the shim fills a structural gap.

I will ask in the NVIDIA dev forum and then come back here.

I will wait to start on the extension/overlay draft PR until I have an answer there.

great news, I also quickly glanced over the comments, and i think some of the comments makes sense, sometimes if both are present it might try to keep the old one too, LLM's might not catch it unless you prompt it to say it's old, that might be worth giving a try, I would suggest to pipe the conversation from OE4T into an LLM and see if it re-thinks

Anyways we still have some time to polish this up before 1.14

Replace wip-r36.5-take-2 snapshot (ccf7646) with the patches-r36.5 branch HEAD (ea32e7f, 2026-03-05) which is a more stable reference with the same kernel 6.18 compatibility fixes. Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>

Move the inline printf'd clang wrapper script into a dedicated scripts/clang-oot file and install it via cp, making it easier to review and maintain. Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>

Module load ordering is handled by depmod in the Talos imager; the modprobe.d/nvidia-tegra.conf file is not needed here. Also remove the debug find /rootfs output. Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>

Convert the Python/awk-based nvgpu source patches to standard unified diff files applied with the patch tool: - 0001: nvhost_host1x.c — retry loop + syncpt id=0 guard for GA10B ERRATA_SYNCPT_INVALID_ID_0 (combines sed/python/awk into one clean patch, adds #include <linux/delay.h>) - 0002: netlist_priv.h — flexible array fix (regions[1] → regions[]) Remove scripts/patch_nvhost_host1x.py (superseded by 0001 patch). Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>

talos-bot added this to Planning Apr 22, 2026

github-project-automation Bot moved this to To Do in Planning Apr 22, 2026

talos-bot moved this from To Do to In Review in Planning Apr 22, 2026

schwankner changed the title ~~feat: add nvidia-tegra-nvgpu package for Jetson Orin NX (GA10B)~~ feat: add nvidia-tegra-nvgpu package for Jetson Orin (GA10B) Apr 22, 2026

schwankner force-pushed the feat/nvidia-tegra-jetson-orin-main branch from c9f7779 to ab7c963 Compare April 22, 2026 08:48

schwankner mentioned this pull request Apr 22, 2026

feat: add support for Jetson Orin Nano and AGX Orin SBCs siderolabs/sbc-jetson#23

Open