feat: add nvidia-tegra-nvgpu package for Jetson Orin (GA10B)#1518
feat: add nvidia-tegra-nvgpu package for Jetson Orin (GA10B)#1518schwankner wants to merge 17 commits intosiderolabs:mainfrom
Conversation
Adds OE4T-patched GPU driver stack for NVIDIA Jetson Orin NX (Tegra234 / GA10B): - OE4T host1x + host1x-fence: GA10B syncpoint support with ERRATA_SYNCPT_INVALID_ID_0 fix - nvmap, mc-utils, governor_pod_scaling: standard Tegra support modules - nvhost-ctrl-shim: /dev/nvhost-ctrl userspace interface for JetPack 6 CUDA runtime - nvgpu: main GA10B GPU driver (OE4T patches, Clang build, kernel 6.18 compat) The nvhost-ctrl-shim provides hardware syncpoint interrupt support for cudaStreamSynchronize via NVHOST_IOCTL_CTRL_SYNC_FENCE_CREATE + SYNC_FILE_EXTRACT, enabling full CUDA throughput instead of CPU semaphore polling. Built with Clang (LLVM=1), requires OE4T linux-nv-oot (wip-r36.5-take-2) for kernel 6.18 compatibility. CONFIG_TEGRA_GK20A_NVHOST=y uses OE4T host1x with HOST1X_SYNCPT_GPU support. Tested: ~60 tok/s qwen2.5:0.5b on Jetson Orin NX 16GB with Talos Linux v1.13. Continues: siderolabs#1166 Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>
c9f7779 to
ab7c963
Compare
Per review feedback: extract the OOT module Makefile patching block from pkg.yaml inline shell into nvidia-tegra-nvgpu/scripts/fixup.sh. Bldr makes the package directory available at /pkg/ during build, so the script is invoked as /pkg/scripts/fixup.sh. Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>
040d726 to
8805349
Compare
…ld error - Move base64-encoded nvhost_host1x.c patch to scripts/patch_nvhost_host1x.py for readability (per review feedback) - governor_pod_scaling build failures now exit 1 instead of continuing silently (per review feedback) Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>
|
@claude review |
Add renovate datasource annotations for the three OE4T git repositories so dependency updates can be tracked automatically: - OE4T/linux-nvgpu (GA10B GPU driver) - OE4T/linux-nv-oot (NVIDIA out-of-tree modules) - OE4T/linux-hwpm (hardware performance monitor) Note: sha256/sha512 checksums must be updated manually alongside the commit hash when renovate proposes an update. Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>
There was a problem hiding this comment.
Pull request overview
Adds a new Talos kernel package (nvidia-tegra-nvgpu) to build and ship the OE4T-patched NVIDIA Tegra (Jetson Orin / GA10B) GPU driver stack, including a nvhost-ctrl shim module needed for JetPack 6 CUDA runtime behavior.
Changes:
- Introduces the
nvidia-tegra-nvgpupackage build/install pipeline for multiple OE4T OOT modules +nvgpu.ko, including patching steps for kernel 6.18 compatibility. - Adds an in-tree
nvhost_ctrl_shim.cmodule implementing/dev/nvhost-ctrland bridging NVHOST ioctls to host1x syncpoints/fences. - Adds OE4T source pins (commit + hashes) to
Pkgfileto fetch/build the required upstream driver sources reproducibly.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
nvidia-tegra-nvgpu/pkg.yaml |
New package recipe: fetch OE4T sources, apply fixups/patches, build modules with clang wrapper, install + sign modules, add modprobe softdeps. |
nvidia-tegra-nvgpu/nvhost_ctrl_shim.c |
New kernel module implementing /dev/nvhost-ctrl and required ioctls for CUDA runtime syncpoint waits and fence interop. |
nvidia-tegra-nvgpu/scripts/fixup.sh |
Build-time patch script to adjust OE4T OOT module Makefiles and force kernel-compat conftest paths for 6.18. |
nvidia-tegra-nvgpu/scripts/patch_nvhost_host1x.py |
Build-time source patch to add a retry loop in nvhost_host1x.c for GA10B syncpt allocation failures. |
Pkgfile |
Adds pinned OE4T source revisions and checksums used by the new package. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…SIG_ALL=y Replace manual sign-file loop with standard KBUILD modules_install for each OOT module directory. Signing and debug-info stripping are handled natively by the kernel build system (same pattern as gasket-driver, zfs, hailort, etc.). host1x uses INSTALL_MOD_DIR=kernel/drivers/gpu/host1x to shadow the in-tree module; all other modules install to INSTALL_MOD_DIR=extra/nvidia-tegra. Add test: section with module-signature and fhs-validator checks. Fix finalize to: /rootfs → to: / (matches pkgs convention). Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>
nvhost_ctrl_shim.c: - get_host1x(): call put_device(&pdev->dev) after platform_get_drvdata() to release the reference acquired by of_find_device_by_node(); fixes a device reference leak on every /dev/nvhost-ctrl open - nvhost_ctrl_devnode(): guard mode pointer before dereferencing (mode may be NULL per devnode callback contract); add comment explaining 0666 choice pkg.yaml: - remove duplicate clang-oot wrapper creation from the nvgpu build block; the wrapper is installed once in the preceding OOT build step and persists on disk across shell blocks Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>
37340bf to
fd4bb90
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new Talos kernel module package providing the OE4T-patched NVIDIA Tegra GA10B GPU driver stack (Jetson Orin NX / Tegra234), including a /dev/nvhost-ctrl shim required by the JetPack 6 CUDA runtime.
Changes:
- Introduces a new
nvidia-tegra-nvgpupackage that builds/installs OE4Thost1x,host1x-fence,nvmap,mc-utils,devfreqgovernor,nvgpu, plus an in-packagenvhost-ctrlioctl shim module. - Adds build-time fixup/patch scripts to force kernel-6.18 conftest paths and apply GA10B syncpoint allocation workarounds.
- Adds OE4T source pin variables (commit + checksums) to
Pkgfile.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| nvidia-tegra-nvgpu/pkg.yaml | New package build/install/test pipeline for OE4T modules + nvhost-ctrl shim; applies multiple source patches. |
| nvidia-tegra-nvgpu/nvhost_ctrl_shim.c | New kernel module implementing /dev/nvhost-ctrl NVHOST ioctls backed by host1x syncpoints/dma_fence. |
| nvidia-tegra-nvgpu/scripts/fixup.sh | Build-time Makefile/source patching for OE4T OOT modules to compile against kernel 6.18 with Clang. |
| nvidia-tegra-nvgpu/scripts/patch_nvhost_host1x.py | Build-time Python patcher adding a retry loop to nvhost_host1x.c syncpt allocation. |
| Pkgfile | Adds pinned OE4T commits and checksums for linux-nvgpu, linux-nv-oot, and linux-hwpm. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The awk command that inserts the ERRATA_SYNCPT_INVALID_ID_0 id=0 skip block had no failure check: if the upstream file changes and the pattern is not matched, awk silently writes an unpatched file. Add an explicit grep post-check for the inserted marker; exit 1 if the marker is absent so the build fails loudly instead of shipping a silently broken nvgpu. Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Alexander Schwankner <1496765+schwankner@users.noreply.github.com>
fixup.sh already strips CONFIG_GCC_PLUGIN_LATENT_ENTROPY from /src/include/config/auto.conf and /src/include/generated/autoconf.h in an earlier build step. The identical sed commands in the nvgpu block are a no-op; remove them to keep the fix single-sourced. Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>
CONFTEST_OUT was defined but never referenced in the script. The conftest path is injected via the srctree.nvconftest make variable, not through this shell variable. Remove it to comply with the set -euo pipefail hygiene. Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>
All kernel-dependent packages use the -pkg suffix in their pkg.yaml name field (gasket-driver-pkg, hailort-pkg, zfs-pkg, etc.) and are listed in .kres.yaml so bldr picks them up in CI. Rename nvidia-tegra-nvgpu to nvidia-tegra-nvgpu-pkg and add the entry to .kres.yaml between nvidia-open-gpu-kernel-modules-production-pkg and px-fuse-pkg. Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>
All other kernel module packages (gasket-driver, zfs, hailort, xdma-driver, kmod-nvidia) copy modules.order, modules.builtin, and modules.builtin.modinfo from /src into the rootfs before running modules_install. Add the same block so downstream tooling has the expected kernel module metadata for the target release. Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>
| @@ -0,0 +1,126 @@ | |||
| #!/bin/bash | |||
There was a problem hiding this comment.
do you think OOT accepts these as patches upstream?
There was a problem hiding this comment.
Yes, ideally this should be handled on their side, rather than patching files that have been generated by from Kconfig.
There was a problem hiding this comment.
The kref_init(&syncpt[0].ref) fix in fixup.sh is already present in newer OE4T commits (e.g., 6e071c0 in linux-nv-oot) — bumping the oe4t_nv_oot_commit pin would make that fixup redundant.
The nvhost_host1x.c retry loop addresses a real hardware behavior (NVGPU_ERRATA_SYNCPT_INVALID_ID_0, defined in the nvgpu errata table). I haven't submitted it upstream yet.
The nvhost-ctrl-shim module would require NVIDIA or OE4T to accept a /dev/nvhost-ctrl ABI compatibility layer for mainline-based kernels.
There was a problem hiding this comment.
Agreed in principle. The CONFIG_GCC_PLUGIN_LATENT_ENTROPY removal is needed because the Talos kernel config (built for GCC) includes this option, but OOT modules are compiled with Clang which does not implement the GCC plugin — causing a compile error via linux/random.h. The correct fix is upstream: either in OE4T's build system or in Talos's kernel config for Clang builds. I'll track this as a follow-up.
There was a problem hiding this comment.
Talos kernel is built with Clang now, and therefore modules are also. But perhaps autoconfiguration scripts in OE4T do not have proper support yet, right. Patching them to honor LLVM=1 (or passing the right parameters, maybe CC) would be a more robust solution
There was a problem hiding this comment.
Agreed. The two categories in fixup.sh have different upstream paths:
-
CONFIG_GCC_PLUGIN_LATENT_ENTROPYstrip — this is a Talos kernel config artifact:CONFIG_GCC_PLUGIN_LATENT_ENTROPY=ywas set whenconfig-arm64was generated with GCC. Since Talos now builds with Clang, removing it fromconfig-arm64is the correct fix. I can open a separate PR in this repo for that. -
Conftest macro overrides (
NV_IOMMU_MAP_HAS_GFP_ARG,NV_MM_STRUCT_..., etc.) — these compensate for OE4T's conftest infrastructure not yet probing kernel 6.18 APIs correctly when building with Clang/LLVM=1. The proper fix lives inOE4T/linux-nv-oot: the conftest scripts need kernel 6.18 API detections. I'll track this in an OE4T meta-tegra issue (sincelinux-nv-oothas issues disabled). Once upstream conftest detections are fixed, these overrides can be removed.
There was a problem hiding this comment.
Where do these revisions come from? They do not seem to be tags in the repos, so are they perhaps referenced by meta-tegra or whatever?
Please let us know how to recognize proper, released versions of these drivers and not WIP snapshots which may be unstable
There was a problem hiding this comment.
Valid concern. The three commits come from:
oe4t_nvgpu_commit(d530a48): OE4Tlinux-nvgpubranchpatches-rel-36— the project's default stable branch for the r36.x seriesoe4t_nv_oot_commit(ccf7646): OE4Tlinux-nv-ootbranchwip-r36.5-take-2— admittedly a WIP branch. A more stable alternative ispatches-r36.5(last updated 2026-03-05 with kernel 6.18 build fixes). I can switch to that.oe4t_hwpm_commit(4d8a699): OE4Tlinux-hwpmdefault branch
None have release tags — that is the current state of OE4T for GA10B on kernel 6.x.
There was a problem hiding this comment.
If this module is used by CUDA libraries, why is this feature not supported in the driver itself? Do you plan to request reviews and upstream it?
There was a problem hiding this comment.
/dev/nvhost-ctrl is part of NVIDIA's traditional downstream Tegra kernel ABI (nvidia-t23x-kernel). On a standard JetPack Linux install, this device is provided by NVIDIA's downstream kernel. The OOT nvgpu driver provides GPU compute on mainline kernels but does not include the nvhost control interface.
On Talos Linux (mainline kernel + OOT modules), libnvrm_host1x.so cannot find /dev/nvhost-ctrl and falls back to CPU semaphore polling — which reduces LLM inference throughput from ~60 tok/s to ~7 tok/s on an Orin NX 16GB.
The long-term fix would be for NVIDIA to include nvhost-ctrl in the OOT driver or to update libnvrm_host1x.so to use a different interface. Until then, this shim is the only known way to enable hardware-interrupt-driven sync on mainline kernels.
There was a problem hiding this comment.
Thank you for the explanation. I'm not very familiar with the ecosystem, but perhaps this solution is acceptable (with corresponding extension tier for a downstream driver).
Some public interaction with OE4T would be nice to get this driver reviewed by professionals of this platform, however
There was a problem hiding this comment.
Good point. I'll open an issue in OE4T/meta-tegra to start the conversation — the key question for them is whether /dev/nvhost-ctrl should be integrated into the OOT nvgpu package for mainline deployments, or whether there's a different intended path for JetPack 6 CUDA on non-JetPack kernels. Will link the issue here once open.
There was a problem hiding this comment.
Since we're early in the 1.14 works, I would wait for a while and see what the OE4T folks have to say on this issue, let's wait a week or two before proceeding, does that sound good? Meanwhile we can have draft PR's ready for extensions/overlays
There was a problem hiding this comment.
Hi @frezbo, I got the OE4T feedback from the disscussion.
Short summary: The maintainers position is that /dev/nvhost-ctrl was an older interface and they can't give a definitive answer on whether the shim approach is correct — since it involves NVIDIA's closed-source libraries. They recommended asking the NVIDIA developer forum.
While waiting, I did some deeper analysis on my running system. Key finding: the CUDA process holds both /dev/nvgpu/igpu0/ctrl (channel allocation) and /dev/nvhost-ctrl (syncpoint wait) open simultaneously during inference, not as a fallback sequence. And no module in the OOT stack creates the global /dev/nvhost-ctrl device. host1x-nvhost.ko only creates per-engine devices like nvhost-ctrl-host1x. So the shim fills a structural gap.
I will ask in the NVIDIA dev forum and then come back here.
I will wait to start on the extension/overlay draft PR until I have an answer there.
There was a problem hiding this comment.
great news, I also quickly glanced over the comments, and i think some of the comments makes sense, sometimes if both are present it might try to keep the old one too, LLM's might not catch it unless you prompt it to say it's old, that might be worth giving a try, I would suggest to pipe the conversation from OE4T into an LLM and see if it re-thinks
Anyways we still have some time to polish this up before 1.14
Replace wip-r36.5-take-2 snapshot (ccf7646) with the patches-r36.5 branch HEAD (ea32e7f, 2026-03-05) which is a more stable reference with the same kernel 6.18 compatibility fixes. Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>
Move the inline printf'd clang wrapper script into a dedicated scripts/clang-oot file and install it via cp, making it easier to review and maintain. Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>
Module load ordering is handled by depmod in the Talos imager; the modprobe.d/nvidia-tegra.conf file is not needed here. Also remove the debug find /rootfs output. Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>
cf1392c to
3624f1f
Compare
Convert the Python/awk-based nvgpu source patches to standard unified diff files applied with the patch tool: - 0001: nvhost_host1x.c — retry loop + syncpt id=0 guard for GA10B ERRATA_SYNCPT_INVALID_ID_0 (combines sed/python/awk into one clean patch, adds #include <linux/delay.h>) - 0002: netlist_priv.h — flexible array fix (regions[1] → regions[]) Remove scripts/patch_nvhost_host1x.py (superseded by 0001 patch). Signed-off-by: Alexander Schwankner <mrmoor4@googlemail.com>
Summary
Adds the OE4T-patched GPU driver stack for NVIDIA Jetson Orin NX (Tegra234 / GA10B) as a Talos kernel package.
Hardware: Jetson Orin NX 16GB (Tegra234 / GA10B)
Tested with: Talos v1.13, kernel 6.18, CUDA 12.6
Modules
host1x.kohost1x-fence.konvmap.komc-utils.konvhost-ctrl-shim.ko.c(this pkg)/dev/nvhost-ctrlfor JetPack 6 CUDA runtimenvgpu.konvhost-ctrl-shim
The JetPack 6 CUDA runtime (
libnvrm_host1x.so) requires/dev/nvhost-ctrlforcudaStreamSynchronize. Without it, CUDA falls back to CPU semaphore polling (~7 tok/s vs ~60 tok/s for LLM inference on Orin NX 16GB).The shim bridges 8 ioctls (
GET_VERSION,SYNCPT_READ,SYNCPT_READ_MAX,SYNCPT_WAITMEX,SYNC_FENCE_CREATE,GET_CHARACTERISTICS,POLL_FD_CREATE,SYNC_FILE_EXTRACT) to the OE4T host1x syncpoint API usingdma_fencefor interrupt-driven waits.The source (
nvhost_ctrl_shim.c) is embedded directly in the package directory and made available at/pkg/by bldr at build time.host1x shadow path
OE4T
host1x.kois installed atkernel/drivers/gpu/host1x/to shadow the in-tree module. The squashfs overlay replaces the built-in host1x so nvgpu'sCONFIG_TEGRA_GK20A_NVHOST=ypath gets the OE4T version withHOST1X_SYNCPT_GPUsupport and the GA10B ERRATA_SYNCPT_INVALID_ID_0 fix.Build notes
LLVM=1), matching the Talos toolchainwip-r36.5-take-2adds kernel 6.18 compatibility fixes (__assign_str,f_ref,__alloc_pages_bulk5-arg)certs/signing_key.pemCONFIG_GCC_PLUGIN_LATENT_ENTROPYstripped fromauto.conf/autoconf.hbefore OOT builds (Clang compat).kofiles confirmed presentResult
Full CUDA inference in Kubernetes pods on Talos Linux without
privileged: true, using CDI device injection. Verified ~60 tok/s onqwen2.5:0.5b(Ollama) on Jetson Orin NX 16GB.References