Skip to content

Metal: Implement Draw*IndirectCount emulation via CPU based loop#1595

Draft
JonahGoldsmith wants to merge 2 commits intoturanszkij:masterfrom
JonahGoldsmith:drawindirect
Draft

Metal: Implement Draw*IndirectCount emulation via CPU based loop#1595
JonahGoldsmith wants to merge 2 commits intoturanszkij:masterfrom
JonahGoldsmith:drawindirect

Conversation

@JonahGoldsmith
Copy link
Copy Markdown

@JonahGoldsmith JonahGoldsmith commented Apr 2, 2026

Implements DrawInstancedIndirectCount and DrawIndexedInstancedIndirectCount for Metal by emulating ExecuteIndirectCount behavior.
Adds a helper compute sanitize pass that copies indirect records to scratch, clamps active count, and zeroes instanceCount for inactive records.
Adds minimal render-pass snapshot/split/resume plumbing required to run sanitize compute between render draws without mutating source argument buffers.

Rationale:

Metal render encoders do not provide a native indirect-count draw equivalent to DX12/Vulkan ExecuteIndirectCount semantics.
This patch restores cross-backend behavior by emulating count-based multi-draw safely and deterministically.

Known Limitations:
Emulation iterates max_count records on CPU-side command submission; inactive records are made no-op by zeroing instanceCount.
Requires an active render-pass snapshot (captured in RenderPassBegin) to split for compute sanitize and resume correctly.

(Sanitization version causes weird rendering artifacts and flickers, switched to cpu loop below )

@JonahGoldsmith JonahGoldsmith marked this pull request as draft April 2, 2026 21:52
@JonahGoldsmith JonahGoldsmith changed the title Metal: Implement Draw*IndirectCount emulation via sanitize+resume path Metal: Implement Draw*IndirectCount emulation via CPU based loop Apr 3, 2026
@JonahGoldsmith
Copy link
Copy Markdown
Author

I have changed the implementation to use a CPU loop on Metal. If we were to make it use a full on Indirect Command Buffer, it would require changing a lot of the Metal internals to add specific scenario tracking when someone fills an Indirect Buffer from the GPU and not the CPU. But when filled from the CPU this implementation works correctly

@turanszkij
Copy link
Copy Markdown
Owner

turanszkij commented Apr 4, 2026

It kind of defeats the purpose if we have to set the count from cpu, at that point it's the same as regular drawindirect. I think it would be better to have metal-specific implementation in render code (for specific effect) to try how it works first. But we don't even use drawindirect count right now anywhere to test it.

The other thing you mentioned that paused the encoder and switched to an other one in the middle is also not too good, an other reason it would be needed to have some kind of metal specific pass before the render encoder is started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants