Skip to content

feat(linux/wlr): SHM capture fallback + resilient GBM allocation for headless NVIDIA#4946

Open
atassis wants to merge 3 commits intoLizardByte:masterfrom
atassis:feat/shm-capture-fallback
Open

feat(linux/wlr): SHM capture fallback + resilient GBM allocation for headless NVIDIA#4946
atassis wants to merge 3 commits intoLizardByte:masterfrom
atassis:feat/shm-capture-fallback

Conversation

@atassis
Copy link
Copy Markdown
Contributor

@atassis atassis commented Apr 4, 2026

Description

Two improvements for Wayland capture on headless NVIDIA setups (no physical display connected):

1. SHM capture fallback

On headless NVIDIA, GBM buffer allocation can fail because the driver cannot create GBM buffers without an active DRM output. The existing code logged "SHM capture not implemented" and gave up.

This PR implements the SHM fallback:

  • Binds wl_shm interface from the Wayland registry
  • When DMA-BUF fails, allocates a memfd-backed wl_shm_pool and copies frames via SHM
  • Handles 4bpp (XRGB8888) and 3bpp (BGR888) pixel formats with correct channel conversion
  • Caches GBM failure with sticky gbm_failed flag to avoid retrying every frame
  • Makes EGL initialization non-fatal so wlr_ram_t can operate in SHM-only mode

2. Resilient GBM allocation for headless NVIDIA

Even on headless NVIDIA, GBM can work if we relax the usage flags. This change:

  • Tries progressively relaxed GBM flags: RENDERING|LINEARRENDERINGLINEAR → none
  • Prefers DRM render nodes (renderD*) over primary nodes (card*) — primary nodes require DRM master which is unavailable headless
  • Logs which flag combination succeeded (once, at init)

With both changes, the capture priority is:

  1. DMA-BUF with relaxed GBM flags (zero-copy GPU path) — works on most headless NVIDIA
  2. SHM fallback (CPU memcpy path) — works everywhere, slower

The DMA-BUF path is completely unchanged when GBM succeeds with default flags.

Tested on

  • RTX 4060 Ti + RTX 5060 Ti headless (TrueNAS, no monitor connected)
  • labwc compositor with WLR_BACKENDS=headless
  • GBM succeeds with RENDERING flag, VRAM capture path works end-to-end
  • SHM fallback verified by disabling GBM

Screenshot

Issues Fixed or Closed

Roadmap Issues

Type of Change

  • feat: New feature (non-breaking change which adds functionality)
  • fix: Bug fix (non-breaking change which fixes an issue)
  • docs: Documentation only changes
  • style: Changes that do not affect the meaning of the code (white-space, formatting, missing semicolons, etc.)
  • refactor: Code change that neither fixes a bug nor adds a feature
  • perf: Code change that improves performance
  • test: Adding missing tests or correcting existing tests
  • build: Changes that affect the build system or external dependencies
  • ci: Changes to CI configuration files and scripts
  • chore: Other changes that don't modify src or test files
  • revert: Reverts a previous commit
  • BREAKING CHANGE: Introduces a breaking change (can be combined with any type above)

Checklist

  • Code follows the style guidelines of this project
  • Code has been self-reviewed
  • Code has been commented, particularly in hard-to-understand areas
  • Code docstring/documentation-blocks for new or existing methods/components have been added or updated
  • Unit tests have been added or updated for any new or modified functionality

AI Usage

  • None: No AI tools were used in creating this PR
  • Light: AI provided minor assistance (formatting, simple suggestions)
  • Moderate: AI helped with code generation or debugging specific parts
  • Heavy: AI generated most or all of the code changes

When DMA-BUF capture fails (e.g. GBM cannot allocate buffers on
headless NVIDIA without an active DRM output), fall back to SHM
shared memory capture via wl_shm.

The SHM path creates a memfd-backed wl_shm_pool, receives pixel
data from the compositor via wlr-screencopy, and feeds it to the
encoder through the existing wlr_ram_t CPU path.

Supported SHM formats:
- 4 bpp (XRGB8888/ARGB8888): direct memcpy
- 3 bpp (BGR888): pixel conversion to BGRA8888

Key changes:
- Bind wl_shm interface in screencopy path
- Add create_and_copy_shm() with memfd + mmap allocation
- Refactor create_and_copy_dmabuf() to return bool for fallback
- Cache GBM failure to avoid per-frame retry
- Handle SHM frames in wlr_ram_t::snapshot() via memcpy
- Make EGL init non-fatal (SHM path does not require EGL)
- Force wlr_ram_t on reinit when SHM fallback is active

Tested on headless NVIDIA RTX 5060 Ti with labwc compositor,
NVENC HEVC encoding, streaming to Moonlight client.
@atassis atassis force-pushed the feat/shm-capture-fallback branch from 0262c5d to 2bcea4f Compare April 4, 2026 05:03
BGR888 (0x34324742) stores bytes in R,G,B memory order.
Previous code copied them straight into BGRA8888, causing
red/blue inversion (orange appeared bluish in Moonlight).

Swap src positions 0↔2 so B and R land in correct BGRA slots.
Also deduplicate SHM format log to fire only once.
@atassis
Copy link
Copy Markdown
Contributor Author

atassis commented Apr 6, 2026

I also have a follow-up fix that makes GBM allocation work on headless NVIDIA by trying relaxed usage flags (removing GBM_BO_USE_LINEAR). With this fix, zero-copy DMA-BUF capture works on headless setups — the SHM fallback is only needed if all GBM flag combinations fail.

The change is small (two files):

  • wayland.cpp: try GBM_BO_USE_RENDERING alone when RENDERING|LINEAR fails
  • cuda.cpp: prefer DRM render nodes (renderD*) over primary nodes (card*) — primary nodes require DRM master which is unavailable headless

Tested on RTX 4060 Ti headless (TrueNAS, no monitor) — GBM succeeds with RENDERING flag, VRAM capture path works end-to-end.

Should I add this to this PR or open a separate one?

@ReenigneArcher
Copy link
Copy Markdown
Member

Should I add this to this PR or open a separate one?

I think it can be included here if we make the PR title more generic.

GBM buffer allocation with GBM_BO_USE_RENDERING|GBM_BO_USE_LINEAR
fails on headless NVIDIA render nodes. Try progressively relaxed
flag combinations before falling back to SHM.

Also prefer DRM render nodes over primary nodes in CUDA device
lookup — primary nodes require DRM master which is unavailable
on headless setups.

Tested on RTX 4060 Ti headless (TrueNAS, no monitor):
- GBM succeeds with GBM_BO_USE_RENDERING flag alone
- VRAM capture path works (zero-copy DMA-BUF → EGL → CUDA → NVENC)
- SHM fallback still catches cases where all GBM combos fail
@atassis atassis changed the title feat(linux/wlr): implement SHM capture fallback for headless setups feat(linux/wlr): SHM capture fallback + resilient GBM allocation for headless NVIDIA Apr 7, 2026
@atassis
Copy link
Copy Markdown
Contributor Author

atassis commented Apr 7, 2026

Added the GBM resilience fix in e54a35d. Updated the PR title and description to cover both changes.
With both patches, DMA-BUF capture now works on headless NVIDIA (tested on RTX 4060 Ti + RTX 5060 Ti, no monitors) - GBM succeeds with RENDERING flag alone. SHM fallback is only needed if all flag combinations fail.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 7, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
3 New issues
3 New Code Smells (required ≤ 0)
2 Duplicated Blocks on New Code (required ≤ 0)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants