Skip to content

feat: add support for Jetson Orin Nano and AGX Orin SBCs#23

Open
gooosetavo wants to merge 1 commit intosiderolabs:mainfrom
gooosetavo:feat/jetson-orin-nano-support
Open

feat: add support for Jetson Orin Nano and AGX Orin SBCs#23
gooosetavo wants to merge 1 commit intosiderolabs:mainfrom
gooosetavo:feat/jetson-orin-nano-support

Conversation

@gooosetavo
Copy link
Copy Markdown

@gooosetavo gooosetavo commented Feb 2, 2026

Goal: add support for Jetson Orin Nano and AGX Orin SBCs

I added two new installers, jetson_orin_nano and jetson_agx_orin, with their own model-specific configs.
The big change under the hood is switching from the OE4T U-Boot fork to U-Boot v2026.01. This gives the project better upstream support going forward, since OE4T hadn't been updated in some time. Mainline U-Boot doesn't have Tegra234 configs needed for the Orin boards yet, so I'm pulling the binaries from NVIDIA's L4T r36.4.4 BSP. Not ideal long-term, but it works and uses NVIDIA's validated device trees. The original nano project does have these defined in the u-boot project, though.

Summary:

  1. Project has three installers total now: original Jetson Nano plus two new Orin variants
  2. Build system handles multiple targets - common U-Boot build gets shared across models
  3. There's some efficiency that could probably be gained by doing a multi-stage docker file, or sharing files between stages, but I'm not sure how to do this with the Sidero build system (no idea where the documentation is for this). Hope someone can help clean that up.

I'll give this a spin on my Orin Nano (p3766) board later this week and let you guys know how that goes!

Additional references:

@github-project-automation github-project-automation Bot moved this to To Do in Planning Feb 2, 2026
@talos-bot talos-bot moved this from To Do to In Review in Planning Feb 2, 2026
@gooosetavo gooosetavo force-pushed the feat/jetson-orin-nano-support branch 3 times, most recently from 7d19a67 to 0662fc2 Compare February 2, 2026 03:13
Add two new installers for Jetson Orin Nano and AGX Orin alongside
existing Jetson Nano support. This enables multi-model architecture
without breaking existing deployments.

Key changes:
- Added dedicated installers for jetson_orin_nano and jetson_agx_orin
- Upgraded from OE4T fork to mainline U-Boot v2026.01 for better upstream support
- Integrated NVIDIA L4T r36.4.4 BSP for accessing prebuilt device trees
- Enhanced build system to handle multiple installer targets
- Updated documentation with comprehensive project overview

Technical details:
- Mainline U-Boot lacks Tegra234 configs for Orin boards
- Using NVIDIA validated device trees from L4T BSP as interim solution
- Build system shares common U-Boot build across models
- Each model has dedicated installer logic with shared base components

Signed-off-by: Gustavo Argote <10593140+gooosetavo@users.noreply.github.com>
@gooosetavo gooosetavo force-pushed the feat/jetson-orin-nano-support branch from ddbbb38 to 02708a0 Compare February 2, 2026 03:26
@smira
Copy link
Copy Markdown
Member

smira commented Feb 2, 2026

I'll give this a spin on my Orin Nano (p3766) board later this week and let you guys know how that goes!

thank you, please let us know how it goes!

@frezbo
Copy link
Copy Markdown
Member

frezbo commented Feb 2, 2026

@gooosetavo doesn;t these boards support native UEFI boot? If so I'm inclined to go in that direction

@gooosetavo
Copy link
Copy Markdown
Author

@gooosetavo doesn;t these boards support native UEFI boot? If so I'm inclined to go in that direction

I'm not sure, but I'll give that a shot!

@gooosetavo
Copy link
Copy Markdown
Author

I finally got a chance to verify and yeah, like @frezbo said, these new boards support UEFI.

I was able to flash an SD card with a vanilla Talos image (see link for the factory config used) and it booted right up, allowing me to connect w/ talosctl. Awesome!

image

I haven't completed the bootstrapping, but looking good so far.

@frezbo frezbo moved this from In Review to On Hold in Planning Feb 18, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 3, 2026

This PR is stale because it has been open 45 days with no activity.

@github-actions github-actions Bot added the Stale label Apr 3, 2026
@mmalyska
Copy link
Copy Markdown

mmalyska commented Apr 4, 2026

@gooosetavo any success? I was trying to make it work for a long time. I've already tried UEFI option, but the nvidia extensions are not supporting jetson modules due to the differences in kernel modules(nvidia modules build for jetson are built as OOT modules with manual changes in linux kernel to make them work).
Here are some PR from my trial and error siderolabs/pkgs#1166 and siderolabs/extensions#624, but non of them in the end worked for me due to the limited time.

@github-actions github-actions Bot removed the Stale label Apr 5, 2026
@schwankner
Copy link
Copy Markdown

Hi all,

Confirming that UEFI boot works great on the Orin NX 16 GB (same T234 SoC).

On the GPU/nvidia extension side — I have a working solution to the OOT module problem described in this thread. The blocker is that CONFIG_TEGRA_GK20A_NVHOST=y in the mainline kernel can't be overridden by an extension. I solved it with a small shim module (nvhost-ctrl-shim) that provides /dev/nvhost-ctrl without any host1x dependency, so nvgpu builds cleanly with CONFIG_TEGRA_GK20A_NVHOST=n.

Full setup running on Talos v1.12.6 with CUDA inference in Kubernetes pods, no privileged: true:
https://github.com/schwankner/talos-jetson-orin

Just posted details on pkgs#1166 as well — happy to help get this unblocked.

@frezbo
Copy link
Copy Markdown
Member

frezbo commented Apr 20, 2026

@schwankner great work, Talos 1.13 moved to using cdi and gpu-operator, some questions, I'm trying to understand:

  • what if CONFIG_TEGRA_GK20A_NVHOST is set to m, since y means it's hard to disable
    and disable loading of kernel one by modules.load.d denylist?
  • is there some docs on host1x, i'm trying to understand what it provides

I'm also suprised the l4t now builds mostly fine with LTS kernel 😅

@mmalyska
Copy link
Copy Markdown

@frezbo unfortunately Nvidia Jetson is not supported by gpu-operator https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/platform-support.html#supported-arm-based-platforms (information in note).
I'm working on enabling jetson using CSV as CDI is not supported by default by nvidia-toolkit for tegra iGPU.

@schwankner That's nice achievement. I'll take a look at your solution :)

@frezbo
Copy link
Copy Markdown
Member

frezbo commented Apr 20, 2026

@frezbo unfortunately Nvidia Jetson is not supported by gpu-operator https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/platform-support.html#supported-arm-based-platforms (information in note). I'm working on enabling jetson using CSV as CDI is not supported by default by nvidia-toolkit for tegra iGPU.

@schwankner That's nice achievement. I'll take a look at your solution :)

got it, I'm looking forward to getting this merged

@schwankner
Copy link
Copy Markdown

@frezbo Thanks for the questions — and to correct my earlier post which mentioned Talos v1.12.6, I'm already running v1.13.0-rc.0.

CONFIG_TEGRA_GK20A_NVHOST=m + denylist

CONFIG_TEGRA_GK20A_NVHOST is part of nvgpu's own OOT Kbuild, not a standard kernel Kconfig. Since nvgpu itself is CONFIG_GK20A=m (an OOT module), NVHOST=y/n just controls what gets compiled into nvgpu.ko — there is no =m that would produce a separately denylistable .ko.

The deeper issue: the mainline drivers/gpu/host1x/ does not have the HOST1X_SYNCPT_GPU allocation flag that nvgpu needs when it allocates syncpoints for GPU channels (NVHOST=y path). The OE4T linux-nv-oot tree provides a patched host1x that adds this.

The approach that works:

  1. Build nvgpu with CONFIG_TEGRA_GK20A_NVHOST=y against the OE4T host1x
  2. Install the OE4T host1x.ko at kernel/drivers/gpu/host1x/ in the extension rootfs — Talos's squashfs overlay loads this instead of the in-tree module at boot
  3. The OE4T host1x exports the kernel-internal API nvgpu needs: host1x_syncpt_alloc (with HOST1X_SYNCPT_GPU), host1x_fence_create, etc.
  4. nvhost-ctrl-shim separately provides /dev/nvhost-ctrl — the userspace-facing character device that libnvrm_host1x.so (JetPack 6 CUDA runtime) calls directly

The separation matters: nvgpu uses the OE4T host1x internally for GPU syncpoint operations. The shim provides the /dev/nvhost-ctrl ioctl interface that CUDA userspace calls independently for stream synchronization.

What host1x provides (and where the shim fits)

host1x is Tegra's syncpoint controller — hardware 32-bit counters the GPU increments when it finishes a kernel, which the CPU can wait on via interrupt. In L4T, the full nvhost companion (drivers/video/tegra/host/) exposes these syncpoints to userspace via /dev/nvhost-ctrl. That framework is proprietary and not in the OE4T tree.

nvhost-ctrl-shim fills exactly that gap: a standalone cdev that registers on module_init (no nvgpu dependency) and implements the NVHOST ioctl interface over the OE4T host1x kernel API. CUDA 12.6's (libnvrm_host1x.so) actual call sequence:

  1. GET_CHARACTERISTICS (nr=14) — once at init, discovers num_syncpts=704 for Orin
  2. SYNCPT_WAITMEX (nr=9) — per token, interrupt-driven wait via dma_fence_wait_timeout() — this is what replaces CPU polling
  3. POLL_FD_CREATE (nr=16) — once during GPU frequency-scaling init (gk20a_scale_init)

The shim implements 8 ioctls total; the three above are what CUDA 12.6 actually calls.

Talos 1.13 / CDI

After posting that initial comment I put together a test branch targeting Talos 1.13.0-rc.0 to verify everything end-to-end and clean up what's no longer needed. The upgrade went through without issues.

The biggest simplification: CDI is enabled by default in containerd in Talos 1.13 (enable_cdi = true, cdi_spec_dirs = ["/run/cdi"]), so the containerd.toml override I had before is gone entirely. machine-patch-cdi.yaml now only sets node labels — CDI itself needs no configuration. The nvidia-cdi-setup DaemonSet writes /run/cdi/nvidia-jetson.yaml at boot and that's all containerd needs.

No gpu-operator involved — as @mmalyska noted, Jetson isn't supported there. The CDI path works directly with containerd's built-in support.

@frezbo
Copy link
Copy Markdown
Member

frezbo commented Apr 22, 2026

yup let's work on getting this merged

@schwankner if you're on the community slack, mind sending me a hi?

@schwankner
Copy link
Copy Markdown

I created the PR siderolabs/pkgs#1518.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: On Hold

Development

Successfully merging this pull request may close these issues.

6 participants