Add torch.compile support for actor and critic models by adenzler-nvidia · Pull Request #198 · leggedrobotics/rsl_rl

adenzler-nvidia · 2026-04-16T07:48:53Z

Summary

Add opt-in torch.compile support for actor and critic models in OnPolicyRunner, controlled via torch_compile_mode config key (default: "default", set to null to disable)
~1.2-1.4x total iteration speedup on CNN-based policies (Kuka-Allegro dexsuite with CNNModel); MLP-only policies see no benefit as models are too small for compilation overhead to pay off
Graceful fallback to eager mode if torch.compile fails
Clean serialization via _unwrap_compiled() helper that detects compiled models by state_dict key prefixes — no reliance on private class names
CUDA-graph-based compile modes (reduce-overhead, max-autotune) are blocked as they're incompatible with the two-model actor/critic pattern
Support CNNModel with zero 1D observation groups (perception-only configurations)

Closes #196

Test plan

Verify torch.compile speedup on CNN-based policy (dexsuite single_camera)
Verify MLP-only policy is unaffected
Verify save/load round-trip with compiled models produces clean state_dict keys
Verify JIT and ONNX export work with compiled models
Verify torch_compile_mode: null disables compilation
Verify graceful fallback when compile fails

🤖 Generated with Claude Code

Wrap the actor with torch.compile in OnPolicyRunner, defaulting to mode="default" for automatic Triton kernel fusion. Export methods (JIT, ONNX, get_inference_policy) unwrap the compiled model so scripting and tracing work unchanged. A one-time message after the first iteration reminds users that compile overhead is amortized.

- Compile both actor and critic (was actor-only) - Reorder act() to consume all actor distribution state before critic call - Add cudagraph_mark_step_begin() in learning loop for future CUDA graph compat - Swap uncompiled models for save/load to keep clean state_dict keys - Reject CUDA-graph modes (reduce-overhead, max-autotune) with clear error: critic's graph replay invalidates actor graph output buffers - Supported modes: default, max-autotune-no-cudagraphs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove cudagraph_mark_step_begin() call and duplicate comments from the CUDA graph investigation — not needed since those modes are rejected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Reset ppo.py to main and re-apply only the torch-compile changes: - Reorder act() to consume actor distribution state before critic - Reorder learning loop similarly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The actor/critic call reorder is not required for the supported compile modes (default, max-autotune-no-cudagraphs). Remove to minimize diff. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…lback Replace manual _uncompiled_actor/_uncompiled_critic reference tracking with a stateless _unwrap_compiled() helper that detects compiled models via state_dict key prefixes and unwraps them for serialization/export. - Add _unwrap_compiled() that detects _orig_mod. prefix in state_dict keys and returns the inner module, with a clear error if PyTorch changes the internal API - Remove _uncompiled_actor/_uncompiled_critic mutable state - Add try/except around torch.compile with fallback to eager mode - Simplify save/load/export to use _unwrap_compiled() uniformly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Allow CNNModel to work when only 2D (image) observation groups are provided, with no 1D (state) groups. This enables perception-only actor/critic configurations. - CNNModel.get_latent: skip 1D concatenation when self.obs_groups is empty - MLPModel.update_normalization: skip when self.obs_groups is empty - _TorchCNNModel/_OnnxCNNModel: same fix for JIT/ONNX export paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This reverts commit d465de3.

adenzler-nvidia and others added 7 commits April 2, 2026 15:19

Clean up CUDA graph investigation leftovers

b7c3963

Remove cudagraph_mark_step_begin() call and duplicate comments from the CUDA graph investigation — not needed since those modes are rejected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove unnecessary ppo.py reordering

bb095ec

The actor/critic call reorder is not required for the supported compile modes (default, max-autotune-no-cudagraphs). Remove to minimize diff. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

adenzler-nvidia mentioned this pull request Apr 16, 2026

torch.compile support #196

Open

Revert "Support CNNModel with zero 1D observation groups"

af765a8

This reverts commit d465de3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add torch.compile support for actor and critic models#198

Add torch.compile support for actor and critic models#198
adenzler-nvidia wants to merge 8 commits intoleggedrobotics:mainfrom
adenzler-nvidia:feature/torch-compile

adenzler-nvidia commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

adenzler-nvidia commented Apr 16, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant