diff --git a/CHANGELOG.md b/CHANGELOG.md
index 2839905a65..0433106257 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -4,6 +4,7 @@
 
 ### Added
 
+- Add composable actuator subsystem with pluggable `Controller` (`ControllerPD`, `ControllerPID`, `ControllerNetMLP`, `ControllerNetLSTM`), `Clamping` (`ClampingMaxForce`, `ClampingDCMotor`, `ClampingPositionBased`), and `Delay` components; supports per-DOF delays, CUDA graph capture, and masked environment reset
 - Add 3D texture-based SDF, replacing NanoVDB volumes in the mesh-mesh collision pipeline for improved performance and CPU compatibility.
 - Add `--benchmark [SECONDS]` flag to examples for headless FPS measurement with warmup
 - Interactive example browser in the GL viewer with tree-view navigation and switch/reset support
@@ -55,6 +56,8 @@
 - Add optional `state` parameter to `SolverBase.update_contacts()` to align the base-class signature with Kamino and MuJoCo solvers
 - Use `Literal` types for `SolverImplicitMPM.Config` string fields with fixed option sets (`solver`, `warmstart_mode`, `collider_velocity_mode`, `grid_type`, `transfer_scheme`, `integration_scheme`)
 - Migrate `wp.array(dtype=X)` type annotations to `wp.array[X]` bracket syntax (Warp 1.12+).
+- Replace `ModelBuilder.add_actuator(actuator_class, input_indices=..., output_indices=..., **kwargs)` with `ModelBuilder.add_actuator(controller_class, index=..., clamping=[...], delay=..., pos_index=..., **ctrl_kwargs)` where each call registers a single DOF
+- Change `ArticulationView.get_actuator_parameter(actuator, name)` and `set_actuator_parameter(actuator, name, values)` to require a `component` argument identifying the owning `Controller`, `Clamping`, or `Delay` instance: `get_actuator_parameter(actuator, actuator.controller, "kp")`
 
 ### Deprecated
 
@@ -74,6 +77,7 @@
 
 ### Removed
 
+- Remove `newton-actuators` package dependency; all actuator functionality is now built into Newton directly
 - Remove `Heightfield.finalize()` and stop storing raw pointers for heightfields in `Model.shape_source_ptr`; heightfield collision data is accessed via `Model.shape_heightfield_index` / `Model.heightfield_data` / `Model.heightfield_elevations`
 - Remove `robot_humanoid` example in favor of `basic_plotting` which uses the same humanoid model with diagnostics visualization
 
diff --git a/docs/api/newton.rst b/docs/api/newton.rst
index 88d2c194ea..52fb7a7390 100644
--- a/docs/api/newton.rst
+++ b/docs/api/newton.rst
@@ -10,6 +10,7 @@ newton
 .. toctree::
    :hidden:
 
+   newton_actuators
    newton_geometry
    newton_ik
    newton_math
@@ -22,6 +23,7 @@ newton
 
 .. rubric:: Submodules
 
+- :doc:`newton.actuators <newton_actuators>`
 - :doc:`newton.geometry <newton_geometry>`
 - :doc:`newton.ik <newton_ik>`
 - :doc:`newton.math <newton_math>`
diff --git a/docs/api/newton_actuators.rst b/docs/api/newton_actuators.rst
new file mode 100644
index 0000000000..6f8e3bf40f
--- /dev/null
+++ b/docs/api/newton_actuators.rst
@@ -0,0 +1,43 @@
+.. SPDX-FileCopyrightText: Copyright (c) 2025 The Newton Developers
+.. SPDX-License-Identifier: CC-BY-4.0
+
+newton.actuators
+================
+
+GPU-accelerated actuator models for physics simulations.
+
+This module provides a modular library of actuator components — controllers,
+clamping, and delay — that compute joint forces from simulation state and
+control targets. Components are composed into an :class:`Actuator` instance
+and registered with :meth:`~newton.ModelBuilder.add_actuator` during model
+construction.
+
+.. py:module:: newton.actuators
+.. currentmodule:: newton.actuators
+
+.. rubric:: Classes
+
+.. autosummary::
+   :toctree: _generated
+   :nosignatures:
+
+   Actuator
+   ActuatorParsed
+   Clamping
+   ClampingDCMotor
+   ClampingMaxForce
+   ClampingPositionBased
+   Controller
+   ControllerNetLSTM
+   ControllerNetMLP
+   ControllerPD
+   ControllerPID
+   Delay
+
+.. rubric:: Functions
+
+.. autosummary::
+   :toctree: _generated
+   :signatures: long
+
+   parse_actuator_prim
diff --git a/docs/concepts/actuators.rst b/docs/concepts/actuators.rst
new file mode 100644
index 0000000000..7d3347e0b5
--- /dev/null
+++ b/docs/concepts/actuators.rst
@@ -0,0 +1,252 @@
+.. SPDX-FileCopyrightText: Copyright (c) 2026 The Newton Developers
+.. SPDX-License-Identifier: CC-BY-4.0
+
+.. currentmodule:: newton.actuators
+
+Actuators
+=========
+
+Actuators provide composable implementations that read physics simulation
+state, compute forces, and write the forces back to control arrays for
+application to the simulation.  The simulator does not need to be part of
+Newton: actuators are designed to be reusable anywhere the caller can provide
+state arrays and consume forces.
+
+Each :class:`Actuator` instance is **vectorized**: a single actuator object
+operates on a batch of DOF indices in global state and control arrays, allowing
+efficient integration into RL workflows with many parallel environments.
+
+The goal is to provide canonical actuator models with support for
+**differentiability** and **graphable execution** where the underlying
+controller implementation supports it.  Actuators are designed to be easy to
+customize and extend for specific actuator models.
+
+Architecture
+------------
+
+An actuator is composed from three building blocks, applied in this order:
+
+.. code-block:: text
+
+   Actuator
+   ├── Delay       (optional: delays control targets by N timesteps)
+   ├── Controller  (control law that computes raw forces)
+   └── Clamping[]  (clamps raw forces based on motor-limit modeling)
+       ├── ClampingMaxForce        (±max_force box clamp)
+       ├── ClampingDCMotor         (velocity-dependent saturation)
+       └── ClampingPositionBased   (angle-dependent lookup table)
+
+**Delay**
+   Optionally delays the control targets (e.g. position or velocity) by *N*
+   timesteps before they reach the controller, allowing the actuator to model
+   communication or processing latency.  The delay always produces output;
+   when the buffer is still filling, the lag is clamped to the available
+   history so the most recent data is returned.
+
+**Controller**
+   Computes raw forces or torques from the current simulator state and control
+   targets.  This is the actuator's control law — for example PD, PID, or
+   neural-network-based control.  See the individual controller class
+   documentation for the control-law equations.
+
+**Clamping**
+   Clamps raw forces based on motor-limit modeling.  This applies
+   post-controller output limits to the computed forces or torques to model
+   motor limits such as saturation, back-EMF losses, performance envelopes, or
+   angle-dependent torque limits.  Multiple clamping stages can be combined on
+   a single actuator.
+
+The per-step pipeline is:
+
+.. code-block:: text
+
+   Delay read → Controller → Clamping → Scatter-add → State updates (controller + delay write)
+
+Controllers and clamping objects are pluggable: implement the
+:class:`Controller` or :class:`Clamping` base class to add new models.
+
+.. note::
+
+   **Current limitations:** the first version does not include a transmission
+   model (gear ratios / linkage transforms), supports only single-input
+   single-output (SISO) actuators (one DOF per actuator), and does not model
+   actuator dynamics (inertia, friction, thermal effects).
+
+Usage
+-----
+
+Actuators are registered during model construction with
+:meth:`~newton.ModelBuilder.add_actuator` and are instantiated automatically
+when the model is finalized:
+
+.. testsetup:: actuator-usage
+
+   import warp as wp
+   import newton
+   from newton.actuators import (
+       Actuator, ClampingMaxForce, ControllerPD, Delay,
+   )
+
+   builder = newton.ModelBuilder()
+   link = builder.add_link()
+   joint = builder.add_joint_revolute(parent=-1, child=link, axis=newton.Axis.Z)
+   builder.add_articulation([joint])
+   dof_index = builder.joint_qd_start[joint]
+
+.. testcode:: actuator-usage
+
+   builder.add_actuator(
+       ControllerPD,
+       index=dof_index,
+       kp=100.0,
+       kd=10.0,
+       delay=5,
+       clamping=[(ClampingMaxForce, {"max_force": 50.0})],
+   )
+
+   model = builder.finalize()
+
+For manual construction (outside of :class:`~newton.ModelBuilder`), compose the
+components directly:
+
+.. testcode:: actuator-usage
+
+   indices = wp.array([0], dtype=wp.uint32)
+   kp = wp.array([100.0], dtype=wp.float32)
+   kd = wp.array([10.0], dtype=wp.float32)
+   max_f = wp.array([50.0], dtype=wp.float32)
+
+   actuator = Actuator(
+       indices,
+       controller=ControllerPD(kp=kp, kd=kd),
+       delay=Delay(delay=wp.array([5], dtype=wp.int32), max_delay=5),
+       clamping=[ClampingMaxForce(max_force=max_f)],
+   )
+
+
+Stateful Actuators
+------------------
+
+Controllers that maintain internal state (e.g. :class:`ControllerPID` with an
+integral accumulator, or :class:`ControllerNetLSTM` with hidden/cell state) and
+actuators with a :class:`Delay` require explicit double-buffered state
+management.  Create two state objects with :meth:`Actuator.state` and swap them
+after each step:
+
+.. testcode:: actuator-usage
+
+   state_0 = model.actuators[0].state()
+   state_1 = model.actuators[0].state()
+   state = model.state()
+   control = model.control()
+
+   for step in range(3):
+       model.actuators[0].step(state, control, state_0, state_1, dt=0.01)
+       state_0, state_1 = state_1, state_0
+
+Stateless actuators (e.g. a plain PD controller without delay) do not require
+state objects — simply omit them:
+
+.. testcode:: actuator-usage
+
+   # Build a stateless actuator (no delay, stateless controller)
+   b2 = newton.ModelBuilder()
+   lk = b2.add_link()
+   jt = b2.add_joint_revolute(parent=-1, child=lk, axis=newton.Axis.Z)
+   b2.add_articulation([jt])
+   b2.add_actuator(ControllerPD, index=b2.joint_qd_start[jt], kp=50.0)
+   m2 = b2.finalize()
+
+   m2.actuators[0].step(m2.state(), m2.control())
+
+Differentiability and Graph Capture
+-----------------------------------
+
+Whether an actuator supports differentiability and CUDA graph capture depends on
+its controller.  :class:`ControllerPD` and :class:`ControllerPID` are fully
+graphable.  Neural-network controllers (:class:`ControllerNetMLP`,
+:class:`ControllerNetLSTM`) require PyTorch and are not graphable due to
+framework interop overhead.
+
+:meth:`Actuator.is_graphable` returns ``True`` when all components can be
+captured in a CUDA graph.
+
+Available Components
+--------------------
+
+Delay
+^^^^^
+
+* :class:`Delay` — circular-buffer delay for control targets (stateful).
+
+Controllers
+^^^^^^^^^^^
+
+* :class:`ControllerPD` — proportional-derivative control law (stateless).
+* :class:`ControllerPID` — proportional-integral-derivative control law
+  (stateful: integral accumulator with anti-windup clamp).
+* :class:`ControllerNetMLP` — MLP neural-network controller (requires
+  PyTorch, stateful: position/velocity history buffers).
+* :class:`ControllerNetLSTM` — LSTM neural-network controller (requires
+  PyTorch, stateful: hidden/cell state).
+
+See the API documentation for each controller's control-law equations.
+
+Clamping
+^^^^^^^^
+
+* :class:`ClampingMaxForce` — symmetric box clamp to ±max_force per actuator.
+* :class:`ClampingDCMotor` — velocity-dependent torque saturation using the DC
+  motor torque-speed characteristic.
+* :class:`ClampingPositionBased` — angle-dependent torque limits via
+  interpolated lookup table (e.g. for linkage-driven joints).
+
+Multiple clamping objects can be stacked on a single actuator; they are applied
+in sequence.
+
+Customization
+-------------
+
+Any actuator can be assembled from the existing building blocks — mix and
+match controllers, clamping stages, and delay to fit a specific use case.
+When the built-in components are not sufficient, implement new ones by
+subclassing :class:`Controller` or :class:`Clamping`.
+
+For example, a custom controller needs to implement
+:meth:`~Controller.compute` and :meth:`~Controller.resolve_arguments`:
+
+.. code-block:: python
+   :caption: Skeleton — the ``compute`` body is omitted; see existing
+             controllers for complete examples.
+
+   import warp as wp
+   from newton.actuators import Controller
+
+   class MyController(Controller):
+       @classmethod
+       def resolve_arguments(cls, args):
+           return {"gain": args.get("gain", 1.0)}
+
+       def __init__(self, gain: wp.array):
+           self.gain = gain
+
+       def compute(self, positions, velocities, target_pos, target_vel,
+                   feedforward, pos_indices, vel_indices,
+                   target_pos_indices, target_vel_indices,
+                   forces, state, dt, device=None):
+           # Launch a Warp kernel that writes into `forces`
+           ...
+
+``resolve_arguments`` maps user-provided keyword arguments (from
+:meth:`~newton.ModelBuilder.add_actuator` or USD schemas) to constructor
+parameters, filling in defaults where needed.
+
+Similarly, a custom clamping stage subclasses :class:`Clamping` and implements
+:meth:`~Clamping.modify_forces`.
+
+See Also
+--------
+
+* :mod:`newton.actuators` — full API reference
+* :meth:`newton.ModelBuilder.add_actuator` — registering actuators during
+  model construction
diff --git a/docs/generate_api.py b/docs/generate_api.py
index 06f7c666b4..b5805f6c25 100644
--- a/docs/generate_api.py
+++ b/docs/generate_api.py
@@ -41,6 +41,7 @@
 # Modules for which we want API pages.  Feel free to modify.
 MODULES: list[str] = [
     "newton",
+    "newton.actuators",
     "newton.geometry",
     "newton.ik",
     "newton.math",
diff --git a/docs/guide/release.rst b/docs/guide/release.rst
index c47c9eae89..b06337395d 100644
--- a/docs/guide/release.rst
+++ b/docs/guide/release.rst
@@ -64,7 +64,7 @@ Pre-release planning
      - Determine target version (``X.Y.Z``).
    * - ☐
      - Confirm dependency versions and availability: warp-lang, mujoco,
-       mujoco-warp, newton-usd-schemas, newton-actuators.
+       mujoco-warp, newton-usd-schemas.
    * - ☐
      - Set timeline: code freeze → RC1 → testing window → GA.
    * - ☐
diff --git a/docs/index.rst b/docs/index.rst
index 513cdede3d..7e3591a0d6 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -35,6 +35,7 @@ Newton Physics
    Custom Attributes <concepts/custom_attributes>
    Extended Attributes <concepts/extended_attributes>
    Collisions and Contacts <concepts/collisions>
+   Actuators <concepts/actuators>
    
 .. toctree::
    :maxdepth: 1
diff --git a/newton/__init__.py b/newton/__init__.py
index 6b1517e1b1..d17ba7dac3 100644
--- a/newton/__init__.py
+++ b/newton/__init__.py
@@ -83,9 +83,10 @@
 # ==================================================================================
 # submodule APIs
 # ==================================================================================
-from . import geometry, ik, math, selection, sensors, solvers, usd, utils, viewer  # noqa: E402
+from . import actuators, geometry, ik, math, selection, sensors, solvers, usd, utils, viewer  # noqa: E402
 
 __all__ += [
+    "actuators",
     "geometry",
     "ik",
     "math",
diff --git a/newton/_src/actuators/__init__.py b/newton/_src/actuators/__init__.py
new file mode 100644
index 0000000000..439a1a100d
--- /dev/null
+++ b/newton/_src/actuators/__init__.py
@@ -0,0 +1,24 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 The Newton Developers
+# SPDX-License-Identifier: Apache-2.0
+
+from .actuator import Actuator
+from .clamping import Clamping, ClampingDCMotor, ClampingMaxForce, ClampingPositionBased
+from .controllers import Controller, ControllerNetLSTM, ControllerNetMLP, ControllerPD, ControllerPID
+from .delay import Delay
+from .usd_parser import ActuatorParsed, parse_actuator_prim
+
+__all__ = [
+    "Actuator",
+    "ActuatorParsed",
+    "Clamping",
+    "ClampingDCMotor",
+    "ClampingMaxForce",
+    "ClampingPositionBased",
+    "Controller",
+    "ControllerNetLSTM",
+    "ControllerNetMLP",
+    "ControllerPD",
+    "ControllerPID",
+    "Delay",
+    "parse_actuator_prim",
+]
diff --git a/newton/_src/actuators/actuator.py b/newton/_src/actuators/actuator.py
new file mode 100644
index 0000000000..8034ca4287
--- /dev/null
+++ b/newton/_src/actuators/actuator.py
@@ -0,0 +1,303 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 The Newton Developers
+# SPDX-License-Identifier: Apache-2.0
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import Any
+
+import numpy as np
+import warp as wp
+
+from .clamping.base import Clamping
+from .controllers.base import Controller
+from .delay import Delay
+
+
+# TODO: replace with a Transmission class that does J multiplication before accumulating into the output array.
+@wp.kernel
+def _scatter_add_kernel(
+    forces: wp.array[float],
+    computed_forces: wp.array[float],
+    indices: wp.array[wp.uint32],
+    output: wp.array[float],
+    computed_output: wp.array[float],
+):
+    """Scatter-add forces into output; optionally scatter computed forces too."""
+    i = wp.tid()
+    idx = indices[i]
+    output[idx] = output[idx] + forces[i]
+    if computed_output:
+        computed_output[idx] = computed_output[idx] + computed_forces[i]
+
+
+class Actuator:
+    """Composed actuator: delay → controller → clamping.
+
+    An actuator reads from simulation state/control arrays, optionally
+    delays the control targets, computes forces via a controller, applies
+    clamping (force limits, saturation, etc.), and writes the result to
+    the output array.
+
+    Usage::
+
+        actuator = Actuator(
+            indices=indices,
+            controller=ControllerPD(kp=kp, kd=kd),
+            delay=Delay(delay=wp.array([5, 5], dtype=wp.int32), max_delay=5),
+            clamping=[ClampingMaxForce(max_force=max_f)],
+        )
+
+        # Simulation loop
+        actuator.step(sim_state, sim_control, state_a, state_b, dt=0.01)
+    """
+
+    @dataclass
+    class State:
+        """Composed state for an :class:`Actuator`.
+
+        Holds the delay state (if a delay is present) and the controller
+        state. Clamping objects are stateless.
+        """
+
+        delay_state: Delay.State | None = None
+        """Delay buffer state, or ``None`` if no delay is used."""
+        controller_state: Controller.State | None = None
+        """Controller-specific state, or ``None`` if stateless."""
+
+        def reset(self, mask: wp.array[wp.bool] | None = None) -> None:
+            """Reset composed state.
+
+            Args:
+                mask: Boolean mask of length N. ``True`` entries are reset.
+                    ``None`` resets all.
+            """
+            if self.delay_state is not None:
+                self.delay_state.reset(mask)
+            if self.controller_state is not None:
+                self.controller_state.reset(mask)
+
+    def __init__(
+        self,
+        indices: wp.array[wp.uint32],
+        controller: Controller,
+        delay: Delay | None = None,
+        clamping: list[Clamping] | None = None,
+        pos_indices: wp.array[wp.uint32] | None = None,
+        frc_indices: wp.array[wp.uint32] | None = None,
+        state_pos_attr: str = "joint_q",
+        state_vel_attr: str = "joint_qd",
+        control_target_pos_attr: str = "joint_target_pos",
+        control_target_vel_attr: str = "joint_target_vel",
+        control_feedforward_attr: str | None = "joint_act",
+        control_output_attr: str = "joint_f",
+        control_computed_output_attr: str | None = None,
+        requires_grad: bool = False,
+    ):
+        """Initialize actuator.
+
+        Args:
+            indices: DOF indices into ``joint_qd``-shaped arrays (velocities,
+                velocity targets, feedforward). Shape ``(N,)``.
+            controller: Controller that computes raw forces.
+            delay: Optional Delay instance for input delay.
+            clamping: List of Clamping objects (post-controller force bounds).
+            pos_indices: DOF indices into ``joint_q``-shaped arrays (positions,
+                position targets). Defaults to *indices*. Differs from
+                *indices* for floating-base or ball-joint articulations
+                where ``joint_q`` and ``joint_qd`` have different layouts.
+            frc_indices: DOF indices into ``joint_f``-shaped arrays (output
+                forces). Defaults to *indices*. Differs from *indices*
+                for coupled transmissions or tendon-driven joints.
+            state_pos_attr: Attribute on sim_state for positions.
+            state_vel_attr: Attribute on sim_state for velocities.
+            control_target_pos_attr: Attribute on sim_control for target positions.
+            control_target_vel_attr: Attribute on sim_control for target velocities.
+            control_feedforward_attr: Attribute on sim_control for control input. None to skip.
+            control_output_attr: Attribute on sim_control for clamped output forces.
+            control_computed_output_attr: Attribute on sim_control for raw (pre-clamp)
+                forces. None to skip writing computed forces.
+            requires_grad: Allocate intermediate arrays with gradient support
+                for differentiable simulation.
+        """
+        self.indices = indices
+        self.pos_indices = pos_indices if pos_indices is not None else indices
+        self.frc_indices = frc_indices if frc_indices is not None else indices
+        self.controller = controller
+        self.delay = delay
+        self.clamping = clamping or []
+        self.num_actuators = len(indices)
+
+        self.state_pos_attr = state_pos_attr
+        self.state_vel_attr = state_vel_attr
+        self.control_target_pos_attr = control_target_pos_attr
+        self.control_target_vel_attr = control_target_vel_attr
+        self.control_feedforward_attr = control_feedforward_attr
+        self.control_output_attr = control_output_attr
+        self.control_computed_output_attr = control_computed_output_attr
+
+        self.device = indices.device
+        self.requires_grad = requires_grad
+        self._sequential_indices = wp.array(np.arange(self.num_actuators, dtype=np.uint32), device=self.device)
+        self._computed_forces = wp.zeros(
+            self.num_actuators, dtype=wp.float32, device=self.device, requires_grad=requires_grad
+        )
+        self._applied_forces = (
+            wp.zeros(self.num_actuators, dtype=wp.float32, device=self.device, requires_grad=requires_grad)
+            if self.clamping
+            else None
+        )
+
+        controller.finalize(self.device, self.num_actuators)
+        if delay is not None:
+            delay.finalize(self.device, self.num_actuators, requires_grad=requires_grad)
+        for clamp in self.clamping:
+            clamp.finalize(self.device, self.num_actuators)
+
+    def is_stateful(self) -> bool:
+        """Return True if delay or controller maintains internal state."""
+        return self.delay is not None or self.controller.is_stateful()
+
+    def is_graphable(self) -> bool:
+        """Return True if all components can be captured in a CUDA graph."""
+        return self.controller.is_graphable()
+
+    def state(self) -> Actuator.State | None:
+        """Return a new composed state, or None if fully stateless."""
+        if not self.is_stateful():
+            return None
+        return Actuator.State(
+            delay_state=(self.delay.state(self.num_actuators, self.device) if self.delay is not None else None),
+            controller_state=(
+                self.controller.state(self.num_actuators, self.device) if self.controller.is_stateful() else None
+            ),
+        )
+
+    def step(
+        self,
+        sim_state: Any,
+        sim_control: Any,
+        current_act_state: Actuator.State | None = None,
+        next_act_state: Actuator.State | None = None,
+        dt: float | None = None,
+    ) -> None:
+        """Execute one control step.
+
+        1. **Delay read** — read per-DOF delayed targets from
+           ``current_state`` (falls back to current targets when
+           the buffer is empty).
+        2. **Controller** — compute raw forces into ``_computed_forces``.
+        3. **Clamping** — clamp forces from computed → ``_applied_forces``.
+        4. **Scatter-add** — *accumulate* applied (and optionally computed)
+           forces into the output array.  The caller must zero the output
+           (e.g. ``control.joint_f.zero_()``) before looping over actuators.
+        5. **State updates** — controller state update, then delay
+           buffer write (push current targets into ``next_state``).
+
+        Args:
+            sim_state: Simulation state with position/velocity arrays.
+            sim_control: Control structure with target/output arrays.
+            current_act_state: Current composed state (None if stateless).
+            next_act_state: Next composed state (None if stateless).
+            dt: Timestep [s].
+        """
+        if self.is_stateful() and (current_act_state is None or next_act_state is None):
+            raise ValueError(
+                "Stateful actuator requires both current_act_state and next_act_state; create them via actuator.state()"
+            )
+
+        positions = getattr(sim_state, self.state_pos_attr)
+        velocities = getattr(sim_state, self.state_vel_attr)
+
+        orig_target_pos = getattr(sim_control, self.control_target_pos_attr)
+        orig_target_vel = getattr(sim_control, self.control_target_vel_attr)
+        orig_feedforward = None
+        if self.control_feedforward_attr is not None:
+            orig_feedforward = getattr(sim_control, self.control_feedforward_attr, None)
+
+        target_pos = orig_target_pos
+        target_vel = orig_target_vel
+        feedforward = orig_feedforward
+        target_pos_indices = self.pos_indices
+        target_vel_indices = self.indices
+
+        # --- 1. Delay read (from current_state) ---
+        if self.delay is not None:
+            target_pos, target_vel, feedforward = self.delay.get_delayed_targets(
+                orig_target_pos,
+                orig_target_vel,
+                orig_feedforward,
+                self.pos_indices,
+                self.indices,
+                current_act_state.delay_state,
+            )
+            target_pos_indices = self._sequential_indices
+            target_vel_indices = self._sequential_indices
+
+        # --- 2. Controller: compute raw forces ---
+        ctrl_state = current_act_state.controller_state if current_act_state else None
+        self.controller.compute(
+            positions,
+            velocities,
+            target_pos,
+            target_vel,
+            feedforward,
+            self.pos_indices,
+            self.indices,
+            target_pos_indices,
+            target_vel_indices,
+            self._computed_forces,
+            ctrl_state,
+            dt,
+            device=self.device,
+        )
+
+        # --- 3. Clamping: computed → applied (fused copy+clamp) ---
+        if self.clamping:
+            src = self._computed_forces
+            for clamp in self.clamping:
+                clamp.modify_forces(
+                    src,
+                    self._applied_forces,
+                    positions,
+                    velocities,
+                    self.pos_indices,
+                    self.indices,
+                    device=self.device,
+                )
+                src = self._applied_forces
+            output_forces = self._applied_forces
+        else:
+            output_forces = self._computed_forces
+
+        # --- 4. Scatter-add to output ---
+        applied_output = getattr(sim_control, self.control_output_attr)
+        computed_output = (
+            getattr(sim_control, self.control_computed_output_attr)
+            if self.control_computed_output_attr is not None
+            else None
+        )
+        wp.launch(
+            kernel=_scatter_add_kernel,
+            dim=self.num_actuators,
+            inputs=[output_forces, self._computed_forces, self.frc_indices],
+            outputs=[applied_output, computed_output],
+            device=self.device,
+        )
+
+        # --- 5. State updates (write to next_state) ---
+        if self.controller.is_stateful():
+            self.controller.update_state(
+                current_act_state.controller_state,
+                next_act_state.controller_state,
+            )
+        if self.delay is not None:
+            self.delay.update_state(
+                orig_target_pos,
+                orig_target_vel,
+                orig_feedforward,
+                self.pos_indices,
+                self.indices,
+                current_act_state.delay_state,
+                next_act_state.delay_state,
+            )
diff --git a/newton/_src/actuators/clamping/__init__.py b/newton/_src/actuators/clamping/__init__.py
new file mode 100644
index 0000000000..abfd6d445b
--- /dev/null
+++ b/newton/_src/actuators/clamping/__init__.py
@@ -0,0 +1,14 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 The Newton Developers
+# SPDX-License-Identifier: Apache-2.0
+
+from .base import Clamping
+from .clamping_dc_motor import ClampingDCMotor
+from .clamping_max_force import ClampingMaxForce
+from .clamping_position_based import ClampingPositionBased
+
+__all__ = [
+    "Clamping",
+    "ClampingDCMotor",
+    "ClampingMaxForce",
+    "ClampingPositionBased",
+]
diff --git a/newton/_src/actuators/clamping/base.py b/newton/_src/actuators/clamping/base.py
new file mode 100644
index 0000000000..7d5741954c
--- /dev/null
+++ b/newton/_src/actuators/clamping/base.py
@@ -0,0 +1,75 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 The Newton Developers
+# SPDX-License-Identifier: Apache-2.0
+
+from __future__ import annotations
+
+from typing import Any, ClassVar
+
+import warp as wp
+
+
+class Clamping:
+    """Base class for post-controller force clamping.
+
+    Clamping objects are stacked on top of a controller to bound
+    clamp forces — symmetric limits, velocity-dependent saturation,
+    angle-dependent torque curves, etc. They read from a source force
+    buffer and write bounded values to a destination buffer.
+
+
+    Class Attributes:
+        SHARED_PARAMS: Parameter names that are instance-level (shared across
+            all DOFs). Different values require separate actuator instances.
+    """
+
+    SHARED_PARAMS: ClassVar[set[str]] = set()
+
+    @classmethod
+    def resolve_arguments(cls, args: dict[str, Any]) -> dict[str, Any]:
+        """Resolve user-provided arguments with defaults.
+
+        Args:
+            args: User-provided arguments.
+
+        Returns:
+            Complete arguments with defaults filled in.
+        """
+        raise NotImplementedError(f"{cls.__name__} must implement resolve_arguments")
+
+    def finalize(self, device: wp.Device, num_actuators: int) -> None:
+        """Called by :class:`~newton.actuators.Actuator` after construction to set up device-specific resources.
+
+        Override in subclasses that need to move arrays to a specific device.
+
+        Args:
+            device: Warp device to use.
+            num_actuators: Number of actuators (DOFs) this clamping manages.
+        """
+
+    def modify_forces(
+        self,
+        src_forces: wp.array[float],
+        dst_forces: wp.array[float],
+        positions: wp.array[float],
+        velocities: wp.array[float],
+        pos_indices: wp.array[wp.uint32],
+        vel_indices: wp.array[wp.uint32],
+        device: wp.Device | None = None,
+    ) -> None:
+        """Read forces from src, apply clamping, write to dst.
+
+        When src and dst are the same array, this is an in-place update.
+        The Actuator uses different arrays for the first clamping
+        (to preserve the raw controller output) and the same array
+        for subsequent clampings.
+
+        Args:
+            src_forces: Input force buffer [N or N·m] to read. Shape (N,).
+            dst_forces: Output force buffer [N or N·m] to write. Shape (N,).
+            positions: Joint positions [m or rad] (global ``joint_q`` array).
+            velocities: Joint velocities [m/s or rad/s] (global ``joint_qd`` array).
+            pos_indices: Indices into *positions* (``joint_q`` layout).
+            vel_indices: Indices into *velocities* (``joint_qd`` layout).
+            device: Warp device for kernel launches.
+        """
+        raise NotImplementedError(f"{type(self).__name__} must implement modify_forces")
diff --git a/newton/_src/actuators/clamping/clamping_dc_motor.py b/newton/_src/actuators/clamping/clamping_dc_motor.py
new file mode 100644
index 0000000000..cdf2459572
--- /dev/null
+++ b/newton/_src/actuators/clamping/clamping_dc_motor.py
@@ -0,0 +1,120 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 The Newton Developers
+# SPDX-License-Identifier: Apache-2.0
+
+from __future__ import annotations
+
+import math
+from typing import Any
+
+import warp as wp
+
+from .base import Clamping
+
+
+@wp.kernel
+def _clamp_dc_motor_kernel(
+    current_vel: wp.array[float],
+    state_indices: wp.array[wp.uint32],
+    saturation_effort: wp.array[float],
+    velocity_limit: wp.array[float],
+    max_force: wp.array[float],
+    src: wp.array[float],
+    dst: wp.array[float],
+):
+    """DC motor velocity-dependent saturation: read src, write to dst.
+
+    τ_max(v) = clamp(τ_sat*(1 - v/v_max),  0,  max_force)
+    τ_min(v) = clamp(τ_sat*(-1 - v/v_max), -max_force, 0)
+    """
+    i = wp.tid()
+    state_idx = state_indices[i]
+    vel = current_vel[state_idx]
+    sat = saturation_effort[i]
+    vel_lim = velocity_limit[i]
+    max_f = max_force[i]
+
+    max_torque = wp.clamp(sat * (1.0 - vel / vel_lim), 0.0, max_f)
+    min_torque = wp.clamp(sat * (-1.0 - vel / vel_lim), -max_f, 0.0)
+    dst[i] = wp.clamp(src[i], min_torque, max_torque)
+
+
+class ClampingDCMotor(Clamping):
+    """DC motor velocity-dependent torque saturation.
+
+    Clips controller output using the torque-speed characteristic:
+        τ_max(v) = clamp(τ_sat*(1 - v/v_max),  0,  effort_limit)
+        τ_min(v) = clamp(τ_sat*(-1 - v/v_max), -effort_limit, 0)
+
+    At zero velocity the motor can produce up to ±τ_sat (capped by
+    effort_limit). As velocity approaches v_max, available torque in
+    the direction of motion drops to zero.
+    """
+
+    @classmethod
+    def resolve_arguments(cls, args: dict[str, Any]) -> dict[str, Any]:
+        if "velocity_limit" not in args:
+            raise ValueError("ClampingDCMotor requires 'velocity_limit'")
+        vel_lim = args["velocity_limit"]
+        if vel_lim <= 0.0:
+            raise ValueError(f"ClampingDCMotor: velocity_limit must be > 0, got {vel_lim}")
+        if "saturation_effort" not in args:
+            raise ValueError("ClampingDCMotor requires 'saturation_effort'")
+        sat = args["saturation_effort"]
+        if sat <= 0.0:
+            raise ValueError(f"ClampingDCMotor: saturation_effort must be > 0, got {sat}")
+        return {
+            "saturation_effort": sat,
+            "velocity_limit": vel_lim,
+            "max_force": args.get("max_force", math.inf),
+        }
+
+    def __init__(
+        self,
+        saturation_effort: wp.array[float],
+        velocity_limit: wp.array[float],
+        max_force: wp.array[float],
+    ):
+        """Initialize DC motor saturation.
+
+        Args:
+            saturation_effort: Peak motor torque at stall [N·m]. Shape (N,).
+            velocity_limit: Maximum joint velocity [rad/s] for the torque-speed curve. Shape (N,).
+            max_force: Absolute effort limits [N or N·m] (continuous-rated). Shape (N,).
+        """
+        if saturation_effort.shape != velocity_limit.shape:
+            raise ValueError(
+                f"saturation_effort shape {saturation_effort.shape} must match "
+                f"velocity_limit shape {velocity_limit.shape}"
+            )
+        if saturation_effort.shape != max_force.shape:
+            raise ValueError(
+                f"saturation_effort shape {saturation_effort.shape} must match max_force shape {max_force.shape}"
+            )
+        self.saturation_effort = saturation_effort
+        self.velocity_limit = velocity_limit
+        self.max_force = max_force
+
+    def modify_forces(
+        self,
+        src_forces: wp.array[float],
+        dst_forces: wp.array[float],
+        positions: wp.array[float],
+        velocities: wp.array[float],
+        pos_indices: wp.array[wp.uint32],
+        vel_indices: wp.array[wp.uint32],
+        device: wp.Device | None = None,
+    ) -> None:
+        wp.launch(
+            kernel=_clamp_dc_motor_kernel,
+            dim=len(src_forces),
+            inputs=[
+                velocities,
+                vel_indices,
+                self.saturation_effort,
+                self.velocity_limit,
+                self.max_force,
+                src_forces,
+            ],
+            outputs=[dst_forces],
+            device=device,
+        )
diff --git a/newton/_src/actuators/clamping/clamping_max_force.py b/newton/_src/actuators/clamping/clamping_max_force.py
new file mode 100644
index 0000000000..582436336a
--- /dev/null
+++ b/newton/_src/actuators/clamping/clamping_max_force.py
@@ -0,0 +1,59 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 The Newton Developers
+# SPDX-License-Identifier: Apache-2.0
+
+from __future__ import annotations
+
+import math
+from typing import Any
+
+import warp as wp
+
+from .base import Clamping
+
+
+@wp.kernel
+def _box_clamp_kernel(
+    max_force: wp.array[float],
+    src: wp.array[float],
+    dst: wp.array[float],
+):
+    """Clamp src forces to ±max_force, write to dst."""
+    i = wp.tid()
+    dst[i] = wp.clamp(src[i], -max_force[i], max_force[i])
+
+
+class ClampingMaxForce(Clamping):
+    """Box-clamp forces to ±max_force per actuator."""
+
+    @classmethod
+    def resolve_arguments(cls, args: dict[str, Any]) -> dict[str, Any]:
+        max_force = args.get("max_force", math.inf)
+        if max_force < 0:
+            raise ValueError(f"max_force must be non-negative, got {max_force}")
+        return {"max_force": max_force}
+
+    def __init__(self, max_force: wp.array[float]):
+        """Initialize max-force clamp.
+
+        Args:
+            max_force: Per-actuator force limits [N or N·m]. Shape (N,).
+        """
+        self.max_force = max_force
+
+    def modify_forces(
+        self,
+        src_forces: wp.array[float],
+        dst_forces: wp.array[float],
+        positions: wp.array[float],
+        velocities: wp.array[float],
+        pos_indices: wp.array[wp.uint32],
+        vel_indices: wp.array[wp.uint32],
+        device: wp.Device | None = None,
+    ) -> None:
+        wp.launch(
+            kernel=_box_clamp_kernel,
+            dim=len(src_forces),
+            inputs=[self.max_force, src_forces],
+            outputs=[dst_forces],
+            device=device,
+        )
diff --git a/newton/_src/actuators/clamping/clamping_position_based.py b/newton/_src/actuators/clamping/clamping_position_based.py
new file mode 100644
index 0000000000..7f74ad33b1
--- /dev/null
+++ b/newton/_src/actuators/clamping/clamping_position_based.py
@@ -0,0 +1,227 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 The Newton Developers
+# SPDX-License-Identifier: Apache-2.0
+
+from __future__ import annotations
+
+from pathlib import Path
+from typing import Any, ClassVar
+
+import numpy as np
+import warp as wp
+
+from .base import Clamping
+
+
+@wp.func
+def _interp_1d(
+    x: float,
+    xs: wp.array[float],
+    ys: wp.array[float],
+    n: int,
+) -> float:
+    """Linearly interpolate (x -> y) from sorted sample arrays, clamping at boundaries."""
+    if n <= 0:
+        return 0.0
+    if x <= xs[0]:
+        return ys[0]
+    if x >= xs[n - 1]:
+        return ys[n - 1]
+    for k in range(n - 1):
+        if xs[k + 1] >= x:
+            dx = xs[k + 1] - xs[k]
+            if dx == 0.0:
+                return ys[k]
+            t = (x - xs[k]) / dx
+            return ys[k] + t * (ys[k + 1] - ys[k])
+    return ys[n - 1]
+
+
+@wp.kernel
+def _position_based_clamp_kernel(
+    current_pos: wp.array[float],
+    state_indices: wp.array[wp.uint32],
+    lookup_angles: wp.array[float],
+    lookup_torques: wp.array[float],
+    lookup_size: int,
+    src: wp.array[float],
+    dst: wp.array[float],
+):
+    """Angle-dependent clamping via interpolated lookup table: read src, write dst."""
+    i = wp.tid()
+    state_idx = state_indices[i]
+    limit = _interp_1d(current_pos[state_idx], lookup_angles, lookup_torques, lookup_size)
+    dst[i] = wp.clamp(src[i], -limit, limit)
+
+
+class ClampingPositionBased(Clamping):
+    """Angle-dependent torque clamping via lookup table.
+
+    Replaces a fixed ±max_force box clamp with angle-dependent torque
+    limits interpolated from a lookup table.  Models actuators where
+    the transmission ratio and thus maximum output torque vary with
+    joint angle (e.g., linkage-driven joints).
+
+    The lookup table can be provided either as a file path
+    (``lookup_table_path``) or as direct value lists
+    (``lookup_angles`` + ``lookup_torques``).  When a path is given,
+    the file is read in :meth:`finalize`, mirroring the pattern used
+    by the neural-network controllers.
+
+    The lookup table is a shared parameter: all DOFs within one
+    :class:`~newton.actuators.Actuator` group share the same table.
+
+    Class Attributes:
+        SHARED_PARAMS: Parameter names that are instance-level (shared across
+            all DOFs). Different values require separate actuator instances.
+    """
+
+    SHARED_PARAMS: ClassVar[set[str]] = {"lookup_table_path", "lookup_angles", "lookup_torques"}
+
+    @classmethod
+    def resolve_arguments(cls, args: dict[str, Any]) -> dict[str, Any]:
+        """Resolve user-provided arguments with defaults.
+
+        Accepts either ``lookup_table_path`` (file) or
+        ``lookup_angles`` + ``lookup_torques`` (direct values).
+
+        Args:
+            args: User-provided arguments.
+
+        Returns:
+            Complete arguments with defaults filled in.
+        """
+        has_path = "lookup_table_path" in args
+        has_direct = "lookup_angles" in args or "lookup_torques" in args
+
+        if has_path and has_direct:
+            raise ValueError("Provide either 'lookup_table_path' or 'lookup_angles'+'lookup_torques', not both")
+        if not has_path and not has_direct:
+            raise ValueError("ClampingPositionBased requires 'lookup_table_path' or 'lookup_angles'+'lookup_torques'")
+
+        if has_path:
+            return {"lookup_table_path": args["lookup_table_path"]}
+
+        if "lookup_angles" not in args or "lookup_torques" not in args:
+            raise ValueError("Both 'lookup_angles' and 'lookup_torques' are required")
+        angles = tuple(args["lookup_angles"])
+        torques = tuple(args["lookup_torques"])
+        if len(angles) != len(torques):
+            raise ValueError(f"lookup_angles length ({len(angles)}) must match lookup_torques length ({len(torques)})")
+        if any(v < 0 for v in torques):
+            raise ValueError("lookup_torques must contain non-negative values for symmetric clamping")
+        if not all(angles[i] <= angles[i + 1] for i in range(len(angles) - 1)):
+            raise ValueError("lookup_angles must be monotonically non-decreasing for interpolation")
+        return {"lookup_angles": angles, "lookup_torques": torques}
+
+    def __init__(
+        self,
+        lookup_table_path: str | None = None,
+        lookup_angles: tuple[float, ...] | None = None,
+        lookup_torques: tuple[float, ...] | None = None,
+    ):
+        """Initialize position-based clamp.
+
+        Provide *either* ``lookup_table_path`` *or* both
+        ``lookup_angles`` and ``lookup_torques``.
+
+        Args:
+            lookup_table_path: Path to a whitespace/comma-separated
+                text file with two columns (angle, torque).  Lines
+                starting with ``#`` are comments.  The file is read
+                in :meth:`finalize`.
+            lookup_angles: Sorted joint angles [rad] for the torque
+                lookup table.  Shape ``(K,)``.
+            lookup_torques: Max output torques [N·m] corresponding to
+                *lookup_angles*.  Shape ``(K,)``.
+        """
+        if lookup_table_path is None and (lookup_angles is None or lookup_torques is None):
+            raise ValueError("Provide either 'lookup_table_path' or both 'lookup_angles' and 'lookup_torques'")
+        if lookup_angles is not None and lookup_torques is not None:
+            if len(lookup_angles) != len(lookup_torques):
+                raise ValueError(
+                    f"lookup_angles length ({len(lookup_angles)}) must match "
+                    f"lookup_torques length ({len(lookup_torques)})"
+                )
+        self._lookup_table_path = lookup_table_path
+        self._angles_tuple = lookup_angles
+        self._torques_tuple = lookup_torques
+        self.lookup_size: int = 0
+        self.lookup_angles: wp.array[float] | None = None
+        self.lookup_torques: wp.array[float] | None = None
+
+    def _read_lookup_table(self, path: str) -> tuple[list[float], list[float]]:
+        """Parse a whitespace/comma-separated lookup table file.
+
+        Args:
+            path: File path to the lookup table.
+
+        Returns:
+            ``(angles, torques)`` as lists of floats.
+        """
+        table_path = Path(path)
+        if not table_path.is_file():
+            raise ValueError(f"Lookup table file not found: {path}")
+        angles: list[float] = []
+        torques: list[float] = []
+        for raw_line in table_path.read_text().splitlines():
+            stripped = raw_line.strip()
+            if not stripped or stripped.startswith("#"):
+                continue
+            parts = stripped.replace(",", " ").split()
+            angles.append(float(parts[0]))
+            torques.append(float(parts[1]))
+        if not angles:
+            raise ValueError(f"Lookup table file is empty: {path}")
+        if any(v < 0 for v in torques):
+            raise ValueError(f"Lookup table torques must be non-negative in: {path}")
+        if not all(angles[i] <= angles[i + 1] for i in range(len(angles) - 1)):
+            raise ValueError(f"Lookup table angles must be monotonically non-decreasing in: {path}")
+        return angles, torques
+
+    def finalize(self, device: wp.Device, num_actuators: int) -> None:
+        """Called by :class:`Actuator` after construction.
+
+        Reads the lookup table from file (if a path was given) and
+        allocates device arrays.
+
+        Args:
+            device: Warp device to use.
+            num_actuators: Number of actuators (DOFs).
+        """
+        if self._lookup_table_path is not None:
+            angles, torques = self._read_lookup_table(self._lookup_table_path)
+            self._lookup_table_path = None
+        else:
+            angles = list(self._angles_tuple)
+            torques = list(self._torques_tuple)
+            self._angles_tuple = None
+            self._torques_tuple = None
+
+        self.lookup_size = len(angles)
+        self.lookup_angles = wp.array(np.array(angles, dtype=np.float32), dtype=wp.float32, device=device)
+        self.lookup_torques = wp.array(np.array(torques, dtype=np.float32), dtype=wp.float32, device=device)
+
+    def modify_forces(
+        self,
+        src_forces: wp.array[float],
+        dst_forces: wp.array[float],
+        positions: wp.array[float],
+        velocities: wp.array[float],
+        pos_indices: wp.array[wp.uint32],
+        vel_indices: wp.array[wp.uint32],
+        device: wp.Device | None = None,
+    ) -> None:
+        wp.launch(
+            kernel=_position_based_clamp_kernel,
+            dim=len(src_forces),
+            inputs=[
+                positions,
+                pos_indices,
+                self.lookup_angles,
+                self.lookup_torques,
+                self.lookup_size,
+                src_forces,
+            ],
+            outputs=[dst_forces],
+            device=device,
+        )
diff --git a/newton/_src/actuators/controllers/__init__.py b/newton/_src/actuators/controllers/__init__.py
new file mode 100644
index 0000000000..8a64da28a3
--- /dev/null
+++ b/newton/_src/actuators/controllers/__init__.py
@@ -0,0 +1,16 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 The Newton Developers
+# SPDX-License-Identifier: Apache-2.0
+
+from .base import Controller
+from .controller_net_lstm import ControllerNetLSTM
+from .controller_net_mlp import ControllerNetMLP
+from .controller_pd import ControllerPD
+from .controller_pid import ControllerPID
+
+__all__ = [
+    "Controller",
+    "ControllerNetLSTM",
+    "ControllerNetMLP",
+    "ControllerPD",
+    "ControllerPID",
+]
diff --git a/newton/_src/actuators/controllers/base.py b/newton/_src/actuators/controllers/base.py
new file mode 100644
index 0000000000..98227aa85f
--- /dev/null
+++ b/newton/_src/actuators/controllers/base.py
@@ -0,0 +1,134 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 The Newton Developers
+# SPDX-License-Identifier: Apache-2.0
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import Any, ClassVar
+
+import warp as wp
+
+
+@wp.kernel
+def _masked_zero_1d(data: wp.array[float], mask: wp.array[wp.bool]):
+    i = wp.tid()
+    if mask[i]:
+        data[i] = 0.0
+
+
+class Controller:
+    """Base class for controllers that compute raw forces from state error.
+
+    Controllers are the core computation component in an actuator. They read
+    positions, velocities, and targets, then write raw (unclamped) forces to
+    a scratch buffer. Clamping and other post-processing is handled by
+    Clamping objects composed on top.
+
+    Subclasses must override ``compute`` and ``resolve_arguments``.
+
+    Class Attributes:
+        SHARED_PARAMS: Parameter names that are instance-level (shared across
+            all DOFs). Different values require separate actuator instances.
+    """
+
+    @dataclass
+    class State:
+        """Base state for controllers.
+
+        Subclass this in concrete controllers that maintain internal
+        state (e.g. integral accumulators, history buffers).
+        """
+
+        def reset(self, mask: wp.array[wp.bool] | None = None) -> None:
+            """Reset state to initial values.
+
+            Args:
+                mask: Boolean mask of length N. ``True`` entries are reset.
+                    ``None`` resets all.
+            """
+
+    SHARED_PARAMS: ClassVar[set[str]] = set()
+
+    @classmethod
+    def resolve_arguments(cls, args: dict[str, Any]) -> dict[str, Any]:
+        """Resolve user-provided arguments with defaults.
+
+        Args:
+            args: User-provided arguments.
+
+        Returns:
+            Complete arguments with defaults filled in.
+        """
+        raise NotImplementedError(f"{cls.__name__} must implement resolve_arguments")
+
+    def finalize(self, device: wp.Device, num_actuators: int) -> None:
+        """Called by :class:`Actuator` after construction to set up device-specific resources.
+
+        Override in subclasses that need to place tensors or networks
+        on a specific device, or pre-compute index tensors.
+
+        Args:
+            device: Warp device to use.
+            num_actuators: Number of actuators (DOFs) this controller manages.
+        """
+        pass
+
+    def compute(
+        self,
+        positions: wp.array[float],
+        velocities: wp.array[float],
+        target_pos: wp.array[float],
+        target_vel: wp.array[float],
+        feedforward: wp.array[float] | None,
+        pos_indices: wp.array[wp.uint32],
+        vel_indices: wp.array[wp.uint32],
+        target_pos_indices: wp.array[wp.uint32],
+        target_vel_indices: wp.array[wp.uint32],
+        forces: wp.array[float],
+        state: Controller.State | None,
+        dt: float,
+        device: wp.Device | None = None,
+    ) -> None:
+        """Compute raw forces and write to ``forces[i]``.
+
+        Args:
+            positions: Joint positions [m or rad] (global ``joint_q`` array).
+            velocities: Joint velocities [m/s or rad/s] (global ``joint_qd`` array).
+            target_pos: Target positions [m or rad] (global or compact array).
+            target_vel: Target velocities [m/s or rad/s] (global or compact array).
+            feedforward: Feedforward control input [N or N·m] (may be None).
+            pos_indices: Indices into *positions* (``joint_q`` layout).
+            vel_indices: Indices into *velocities* (``joint_qd`` layout).
+            target_pos_indices: Indices into *target_pos*.
+            target_vel_indices: Indices into *target_vel* and *feedforward*.
+            forces: Scratch buffer to write forces [N or N·m] to. Shape (N,).
+            state: Controller state (None if stateless).
+            dt: Timestep [s].
+            device: Warp device for kernel launches.
+        """
+        raise NotImplementedError
+
+    def is_stateful(self) -> bool:
+        """Return True if this controller maintains internal state."""
+        return False
+
+    def is_graphable(self) -> bool:
+        """Return True if compute() can be captured in a CUDA graph."""
+        return True
+
+    def state(self, num_actuators: int, device: wp.Device) -> Controller.State | None:
+        """Create and return a new state object, or None if stateless."""
+        return None
+
+    def update_state(
+        self,
+        current_state: Controller.State,
+        next_state: Controller.State,
+    ) -> None:
+        """Advance internal state after a compute step.
+
+        Args:
+            current_state: Current controller state.
+            next_state: Next controller state to write.
+        """
+        pass
diff --git a/newton/_src/actuators/controllers/controller_net_lstm.py b/newton/_src/actuators/controllers/controller_net_lstm.py
new file mode 100644
index 0000000000..2adba68e00
--- /dev/null
+++ b/newton/_src/actuators/controllers/controller_net_lstm.py
@@ -0,0 +1,186 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 The Newton Developers
+# SPDX-License-Identifier: Apache-2.0
+
+from __future__ import annotations
+
+import typing
+from dataclasses import dataclass
+from typing import Any, ClassVar
+
+import warp as wp
+
+from .base import Controller
+
+if typing.TYPE_CHECKING:
+    import torch
+
+
+class ControllerNetLSTM(Controller):
+    """LSTM-based neural network controller.
+
+    Uses a pre-trained LSTM network to compute joint torques from position
+    error and velocity. The network maintains hidden and cell state across
+    timesteps to capture temporal patterns.
+
+    The network must be callable as:
+        torques, (hidden_new, cell_new) = network(input, (hidden, cell))
+
+    where input has shape (batch, 1, 2) with features [pos_error, velocity],
+    and hidden/cell have shape (num_layers, batch, hidden_size).
+
+    The network is expected to have a ``lstm`` attribute (torch.nn.LSTM) so
+    that num_layers and hidden_size can be inferred automatically.
+    """
+
+    SHARED_PARAMS: ClassVar[set[str]] = {"network_path"}
+
+    @dataclass
+    class State(Controller.State):
+        """LSTM hidden and cell state."""
+
+        hidden: torch.Tensor | None = None
+        """LSTM hidden state, shape (num_layers, N, hidden_size)."""
+        cell: torch.Tensor | None = None
+        """LSTM cell state, shape (num_layers, N, hidden_size)."""
+
+        def reset(self, mask: wp.array[wp.bool] | None = None) -> None:
+            if mask is None:
+                self.hidden = self.hidden.new_zeros(self.hidden.shape)
+                self.cell = self.cell.new_zeros(self.cell.shape)
+            else:
+                t = wp.to_torch(mask).bool()
+                self.hidden[:, t, :] = 0.0
+                self.cell[:, t, :] = 0.0
+
+    @classmethod
+    def resolve_arguments(cls, args: dict[str, Any]) -> dict[str, Any]:
+        if "network_path" not in args:
+            raise ValueError("ControllerNetLSTM requires 'network_path' argument")
+        return {
+            "network_path": args["network_path"],
+        }
+
+    def __init__(
+        self,
+        network: torch.nn.Module | None = None,
+        network_path: str | None = None,
+    ):
+        """Initialize LSTM controller.
+
+        Args:
+            network: Pre-trained LSTM network.
+                If None, loaded from network_path.
+            network_path: Path to a TorchScript model file.
+        """
+        import torch
+
+        self.network_path = network_path
+
+        if network is not None:
+            params = list(network.parameters())
+            self._torch_device = params[0].device if params else torch.device("cpu")
+            self.network = network.eval()
+        elif network_path is not None:
+            self._torch_device = torch.device("cpu")
+            self.network = torch.jit.load(network_path, map_location="cpu").eval()
+        else:
+            raise ValueError("Either 'network' or 'network_path' must be provided")
+
+        if not hasattr(self.network, "lstm"):
+            raise ValueError("network must expose a 'lstm' attribute (torch.nn.LSTM)")
+        lstm = self.network.lstm
+        if not hasattr(lstm, "num_layers"):
+            raise ValueError("network.lstm must be a torch.nn.LSTM (missing num_layers)")
+        if not lstm.batch_first:
+            raise ValueError("network.lstm.batch_first must be True")
+        if lstm.input_size != 2:
+            raise ValueError(f"network.lstm.input_size must be 2 (pos_error, velocity); got {lstm.input_size}")
+        if lstm.bidirectional:
+            raise ValueError("network.lstm must not be bidirectional")
+        if getattr(lstm, "proj_size", 0) != 0:
+            raise ValueError(f"network.lstm.proj_size must be 0; got {lstm.proj_size}")
+
+        self._num_layers = lstm.num_layers
+        self._hidden_size = lstm.hidden_size
+
+        self._torch_input_indices: torch.Tensor | None = None
+        self._torch_sequential_indices: torch.Tensor | None = None
+        self._hidden: torch.Tensor | None = None
+        self._cell: torch.Tensor | None = None
+
+    def finalize(self, device: wp.Device, num_actuators: int) -> None:
+        import torch
+
+        self._torch_device = torch.device(f"cuda:{device.ordinal}" if device.is_cuda else "cpu")
+        self.network = self.network.to(self._torch_device)
+        self._torch_sequential_indices = torch.arange(num_actuators, dtype=torch.long, device=self._torch_device)
+
+    def is_stateful(self) -> bool:
+        return True
+
+    def is_graphable(self) -> bool:
+        return False
+
+    def state(self, num_actuators: int, device: wp.Device) -> ControllerNetLSTM.State:
+        import torch
+
+        return ControllerNetLSTM.State(
+            hidden=torch.zeros(self._num_layers, num_actuators, self._hidden_size, device=self._torch_device),
+            cell=torch.zeros(self._num_layers, num_actuators, self._hidden_size, device=self._torch_device),
+        )
+
+    def compute(
+        self,
+        positions: wp.array[float],
+        velocities: wp.array[float],
+        target_pos: wp.array[float],
+        target_vel: wp.array[float],
+        feedforward: wp.array[float] | None,
+        pos_indices: wp.array[wp.uint32],
+        vel_indices: wp.array[wp.uint32],
+        target_pos_indices: wp.array[wp.uint32],
+        target_vel_indices: wp.array[wp.uint32],
+        forces: wp.array[float],
+        state: ControllerNetLSTM.State,
+        dt: float,
+        device: wp.Device | None = None,
+    ) -> None:
+        import torch
+
+        if self._torch_input_indices is None:
+            self._torch_input_indices = torch.tensor(pos_indices.numpy(), dtype=torch.long, device=self._torch_device)
+            self._torch_vel_indices = torch.tensor(vel_indices.numpy(), dtype=torch.long, device=self._torch_device)
+
+        current_pos = wp.to_torch(positions)
+        current_vel = wp.to_torch(velocities)
+        target = wp.to_torch(target_pos)
+
+        torch_target_pos_idx = (
+            self._torch_input_indices if target_pos_indices is pos_indices else self._torch_sequential_indices
+        )
+
+        pos_error = target[torch_target_pos_idx] - current_pos[self._torch_input_indices]
+        vel = current_vel[self._torch_vel_indices]
+
+        # (N, 1, 2): seq_len=1, features=[pos_error, velocity]
+        net_input = torch.stack([pos_error, vel], dim=1).unsqueeze(1)
+
+        with torch.inference_mode():
+            torques, (self._hidden, self._cell) = self.network(
+                net_input,
+                (state.hidden, state.cell),
+            )
+
+        torques = torques.reshape(len(forces))
+        torques_wp = wp.from_torch(torques.contiguous(), dtype=wp.float32)
+        wp.copy(forces, torques_wp)
+
+    def update_state(
+        self,
+        current_state: ControllerNetLSTM.State,
+        next_state: ControllerNetLSTM.State,
+    ) -> None:
+        if next_state is None:
+            return
+        next_state.hidden = self._hidden
+        next_state.cell = self._cell
diff --git a/newton/_src/actuators/controllers/controller_net_mlp.py b/newton/_src/actuators/controllers/controller_net_mlp.py
new file mode 100644
index 0000000000..b186abd5bd
--- /dev/null
+++ b/newton/_src/actuators/controllers/controller_net_mlp.py
@@ -0,0 +1,186 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 The Newton Developers
+# SPDX-License-Identifier: Apache-2.0
+
+from __future__ import annotations
+
+import typing
+from dataclasses import dataclass
+from typing import Any, ClassVar
+
+import warp as wp
+
+from .base import Controller
+
+if typing.TYPE_CHECKING:
+    import torch
+
+
+class ControllerNetMLP(Controller):
+    """MLP-based neural network controller.
+
+    Uses a pre-trained MLP to compute joint torques from position error
+    and velocity history.
+
+    The network receives concatenated, scaled position-error and velocity
+    history as input.  The output is multiplied by ``torque_scale`` to
+    convert from network units to physical torque.  All three scale
+    factors default to ``1.0`` (no scaling).
+    """
+
+    SHARED_PARAMS: ClassVar[set[str]] = {"network_path"}
+
+    @dataclass
+    class State(Controller.State):
+        """History buffers for MLP controller."""
+
+        pos_error_history: torch.Tensor | None = None
+        """Position error history, shape (history_length, N)."""
+        vel_history: torch.Tensor | None = None
+        """Velocity history, shape (history_length, N)."""
+
+        def reset(self, mask: wp.array[wp.bool] | None = None) -> None:
+            if mask is None:
+                self.pos_error_history.zero_()
+                self.vel_history.zero_()
+            else:
+                t = wp.to_torch(mask).bool()
+                self.pos_error_history[:, t] = 0.0
+                self.vel_history[:, t] = 0.0
+
+    @classmethod
+    def resolve_arguments(cls, args: dict[str, Any]) -> dict[str, Any]:
+        if "network_path" not in args:
+            raise ValueError("ControllerNetMLP requires 'network_path' argument")
+        return {"network_path": args["network_path"]}
+
+    def __init__(
+        self,
+        input_order: str = "pos_vel",
+        input_idx: list[int] | None = None,
+        pos_scale: float = 1.0,
+        vel_scale: float = 1.0,
+        torque_scale: float = 1.0,
+        network: torch.nn.Module | None = None,
+        network_path: str | None = None,
+    ):
+        """Initialize MLP controller.
+
+        Args:
+            input_order: Concatenation order, ``"pos_vel"`` or ``"vel_pos"``.
+            input_idx: History timestep indices to feed the network. ``0`` is
+                the current step, ``1`` one step ago, etc. Defaults to ``[0]``.
+            pos_scale: Scaling factor for position error inputs.
+            vel_scale: Scaling factor for velocity inputs.
+            torque_scale: Scaling factor for output torques.
+            network: Pre-trained network. If None, loaded from *network_path*.
+            network_path: Path to a TorchScript model file.
+        """
+        import torch
+
+        if input_order not in ("pos_vel", "vel_pos"):
+            raise ValueError(f"input_order must be 'pos_vel' or 'vel_pos'; got '{input_order}'")
+        self.input_order = input_order
+        self.input_idx = input_idx if input_idx else [0]
+        if any(i < 0 for i in self.input_idx):
+            raise ValueError(f"input_idx must contain non-negative integers; got {self.input_idx}")
+        self.history_length = max(self.input_idx) + 1
+        self.pos_scale = pos_scale
+        self.vel_scale = vel_scale
+        self.torque_scale = torque_scale
+
+        self.network_path = network_path
+
+        if network is not None:
+            params = list(network.parameters())
+            self._torch_device = params[0].device if params else torch.device("cpu")
+            self.network = network.eval()
+        elif network_path is not None:
+            self._torch_device = torch.device("cpu")
+            self.network = torch.jit.load(network_path, map_location="cpu").eval()
+        else:
+            raise ValueError("Either 'network' or 'network_path' must be provided")
+
+        self._torch_input_indices: torch.Tensor | None = None
+        self._torch_sequential_indices: torch.Tensor | None = None
+
+    def finalize(self, device: wp.Device, num_actuators: int) -> None:
+        import torch
+
+        self._torch_device = torch.device(f"cuda:{device.ordinal}" if device.is_cuda else "cpu")
+        self.network = self.network.to(self._torch_device)
+        self._torch_sequential_indices = torch.arange(num_actuators, dtype=torch.long, device=self._torch_device)
+
+    def is_stateful(self) -> bool:
+        return True
+
+    def is_graphable(self) -> bool:
+        return False
+
+    def state(self, num_actuators: int, device: wp.Device) -> ControllerNetMLP.State:
+        import torch
+
+        return ControllerNetMLP.State(
+            pos_error_history=torch.zeros(self.history_length, num_actuators, device=self._torch_device),
+            vel_history=torch.zeros(self.history_length, num_actuators, device=self._torch_device),
+        )
+
+    def compute(
+        self,
+        positions: wp.array[float],
+        velocities: wp.array[float],
+        target_pos: wp.array[float],
+        target_vel: wp.array[float],
+        feedforward: wp.array[float] | None,
+        pos_indices: wp.array[wp.uint32],
+        vel_indices: wp.array[wp.uint32],
+        target_pos_indices: wp.array[wp.uint32],
+        target_vel_indices: wp.array[wp.uint32],
+        forces: wp.array[float],
+        state: ControllerNetMLP.State,
+        dt: float,
+        device: wp.Device | None = None,
+    ) -> None:
+        import torch
+
+        if self._torch_input_indices is None:
+            self._torch_input_indices = torch.tensor(pos_indices.numpy(), dtype=torch.long, device=self._torch_device)
+            self._torch_vel_indices = torch.tensor(vel_indices.numpy(), dtype=torch.long, device=self._torch_device)
+
+        current_pos = wp.to_torch(positions)
+        current_vel = wp.to_torch(velocities)
+        target = wp.to_torch(target_pos)
+
+        torch_target_pos_idx = (
+            self._torch_input_indices if target_pos_indices is pos_indices else self._torch_sequential_indices
+        )
+
+        pos_error = target[torch_target_pos_idx] - current_pos[self._torch_input_indices]
+        vel = current_vel[self._torch_vel_indices]
+
+        state.pos_error_history[0] = pos_error
+        state.vel_history[0] = vel
+
+        pos_input = torch.stack([state.pos_error_history[i] for i in self.input_idx], dim=1)
+        vel_input = torch.stack([state.vel_history[i] for i in self.input_idx], dim=1)
+
+        if self.input_order == "pos_vel":
+            net_input = torch.cat([pos_input * self.pos_scale, vel_input * self.vel_scale], dim=1)
+        else:
+            net_input = torch.cat([vel_input * self.vel_scale, pos_input * self.pos_scale], dim=1)
+
+        with torch.inference_mode():
+            torques = self.network(net_input)
+
+        torques = torques.reshape(len(forces)) * self.torque_scale
+        torques_wp = wp.from_torch(torques.contiguous(), dtype=wp.float32)
+        wp.copy(forces, torques_wp)
+
+    def update_state(
+        self,
+        current_state: ControllerNetMLP.State,
+        next_state: ControllerNetMLP.State,
+    ) -> None:
+        if next_state is None:
+            return
+        next_state.pos_error_history = current_state.pos_error_history.roll(1, 0)
+        next_state.vel_history = current_state.vel_history.roll(1, 0)
diff --git a/newton/_src/actuators/controllers/controller_pd.py b/newton/_src/actuators/controllers/controller_pd.py
new file mode 100644
index 0000000000..daa643ff1c
--- /dev/null
+++ b/newton/_src/actuators/controllers/controller_pd.py
@@ -0,0 +1,118 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 The Newton Developers
+# SPDX-License-Identifier: Apache-2.0
+
+from __future__ import annotations
+
+from typing import Any
+
+import warp as wp
+
+from .base import Controller
+
+
+@wp.kernel
+def _pd_force_kernel(
+    current_pos: wp.array[float],
+    current_vel: wp.array[float],
+    target_pos: wp.array[float],
+    target_vel: wp.array[float],
+    control_input: wp.array[float],
+    pos_indices: wp.array[wp.uint32],
+    vel_indices: wp.array[wp.uint32],
+    target_pos_indices: wp.array[wp.uint32],
+    target_vel_indices: wp.array[wp.uint32],
+    kp: wp.array[float],
+    kd: wp.array[float],
+    constant_force: wp.array[float],
+    forces: wp.array[float],
+):
+    """PD force: f = constant + act + kp*(target_pos - q) + kd*(target_vel - v)."""
+    i = wp.tid()
+    pos_idx = pos_indices[i]
+    vel_idx = vel_indices[i]
+    tgt_pos_idx = target_pos_indices[i]
+    tgt_vel_idx = target_vel_indices[i]
+
+    position_error = target_pos[tgt_pos_idx] - current_pos[pos_idx]
+    velocity_error = target_vel[tgt_vel_idx] - current_vel[vel_idx]
+
+    const_f = float(0.0)
+    if constant_force:
+        const_f = constant_force[i]
+
+    act = float(0.0)
+    if control_input:
+        act = control_input[tgt_vel_idx]
+
+    force = const_f + act + kp[i] * position_error + kd[i] * velocity_error
+    forces[i] = force
+
+
+class ControllerPD(Controller):
+    """Stateless PD controller.
+
+    Force law: f = constant + act + Kp*(target_pos - q) + Kd*(target_vel - v)
+
+    """
+
+    @classmethod
+    def resolve_arguments(cls, args: dict[str, Any]) -> dict[str, Any]:
+        return {
+            "kp": args.get("kp", 0.0),
+            "kd": args.get("kd", 0.0),
+            "constant_force": args.get("constant_force", 0.0),
+        }
+
+    def __init__(
+        self,
+        kp: wp.array[float],
+        kd: wp.array[float],
+        constant_force: wp.array[float] | None = None,
+    ):
+        """Initialize PD controller.
+
+        Args:
+            kp: Proportional gains. Shape (N,).
+            kd: Derivative gains. Shape (N,).
+            constant_force: Constant force offsets [N or N·m]. Shape (N,). None to skip.
+        """
+        self.kp = kp
+        self.kd = kd
+        self.constant_force = constant_force
+
+    def compute(
+        self,
+        positions: wp.array[float],
+        velocities: wp.array[float],
+        target_pos: wp.array[float],
+        target_vel: wp.array[float],
+        feedforward: wp.array[float] | None,
+        pos_indices: wp.array[wp.uint32],
+        vel_indices: wp.array[wp.uint32],
+        target_pos_indices: wp.array[wp.uint32],
+        target_vel_indices: wp.array[wp.uint32],
+        forces: wp.array[float],
+        state: Controller.State | None,
+        dt: float,
+        device: wp.Device | None = None,
+    ) -> None:
+        wp.launch(
+            kernel=_pd_force_kernel,
+            dim=len(forces),
+            inputs=[
+                positions,
+                velocities,
+                target_pos,
+                target_vel,
+                feedforward,
+                pos_indices,
+                vel_indices,
+                target_pos_indices,
+                target_vel_indices,
+                self.kp,
+                self.kd,
+                self.constant_force,
+            ],
+            outputs=[forces],
+            device=device,
+        )
diff --git a/newton/_src/actuators/controllers/controller_pid.py b/newton/_src/actuators/controllers/controller_pid.py
new file mode 100644
index 0000000000..e72cc4d249
--- /dev/null
+++ b/newton/_src/actuators/controllers/controller_pid.py
@@ -0,0 +1,177 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 The Newton Developers
+# SPDX-License-Identifier: Apache-2.0
+
+from __future__ import annotations
+
+import math
+from dataclasses import dataclass
+from typing import Any
+
+import warp as wp
+
+from .base import Controller, _masked_zero_1d
+
+
+@wp.kernel
+def _pid_force_kernel(
+    current_pos: wp.array[float],
+    current_vel: wp.array[float],
+    target_pos: wp.array[float],
+    target_vel: wp.array[float],
+    control_input: wp.array[float],
+    pos_indices: wp.array[wp.uint32],
+    vel_indices: wp.array[wp.uint32],
+    target_pos_indices: wp.array[wp.uint32],
+    target_vel_indices: wp.array[wp.uint32],
+    kp: wp.array[float],
+    ki: wp.array[float],
+    kd: wp.array[float],
+    integral_max: wp.array[float],
+    constant_force: wp.array[float],
+    dt: float,
+    current_integral: wp.array[float],
+    forces: wp.array[float],
+    next_integral: wp.array[float],
+):
+    """PID force: f = constant + act + kp*e + ki*integral + kd*de."""
+    i = wp.tid()
+    pos_idx = pos_indices[i]
+    vel_idx = vel_indices[i]
+    tgt_pos_idx = target_pos_indices[i]
+    tgt_vel_idx = target_vel_indices[i]
+
+    position_error = target_pos[tgt_pos_idx] - current_pos[pos_idx]
+    velocity_error = target_vel[tgt_vel_idx] - current_vel[vel_idx]
+
+    integral = current_integral[i] + position_error * dt
+    integral = wp.clamp(integral, -integral_max[i], integral_max[i])
+
+    const_f = float(0.0)
+    if constant_force:
+        const_f = constant_force[i]
+
+    act = float(0.0)
+    if control_input:
+        act = control_input[tgt_vel_idx]
+
+    force = const_f + act + kp[i] * position_error + ki[i] * integral + kd[i] * velocity_error
+    forces[i] = force
+    next_integral[i] = integral
+
+
+class ControllerPID(Controller):
+    """Stateful PID controller.
+
+    Force law: f = constant + act + Kp*e + Ki*∫e·dt + Kd*de
+
+    Maintains an integral term with integral clamping.
+    """
+
+    @dataclass
+    class State(Controller.State):
+        """Integral state for PID controller."""
+
+        integral: wp.array[float] | None = None
+        """Accumulated integral of position error, shape (N,)."""
+
+        def reset(self, mask: wp.array[wp.bool] | None = None) -> None:
+            if mask is None:
+                self.integral.zero_()
+            else:
+                wp.launch(_masked_zero_1d, dim=len(mask), inputs=[self.integral, mask])
+
+    @classmethod
+    def resolve_arguments(cls, args: dict[str, Any]) -> dict[str, Any]:
+        integral_max = args.get("integral_max", math.inf)
+        if integral_max < 0:
+            raise ValueError(f"integral_max must be >= 0, got {integral_max}")
+        return {
+            "kp": args.get("kp", 0.0),
+            "ki": args.get("ki", 0.0),
+            "kd": args.get("kd", 0.0),
+            "integral_max": integral_max,
+            "constant_force": args.get("constant_force", 0.0),
+        }
+
+    def __init__(
+        self,
+        kp: wp.array[float],
+        ki: wp.array[float],
+        kd: wp.array[float],
+        integral_max: wp.array[float],
+        constant_force: wp.array[float] | None = None,
+    ):
+        """Initialize PID controller.
+
+        Args:
+            kp: Proportional gains. Shape (N,).
+            ki: Integral gains. Shape (N,).
+            kd: Derivative gains. Shape (N,).
+            integral_max: Anti-windup limits (>= 0). Shape (N,).
+            constant_force: Constant force offsets [N or N·m]. Shape (N,). None to skip.
+        """
+        self.kp = kp
+        self.ki = ki
+        self.kd = kd
+        self.integral_max = integral_max
+        self.constant_force = constant_force
+        self._next_integral: wp.array[float] | None = None
+
+    def finalize(self, device: wp.Device, num_actuators: int) -> None:
+        self._next_integral = wp.zeros(num_actuators, dtype=wp.float32, device=device)
+
+    def is_stateful(self) -> bool:
+        return True
+
+    def state(self, num_actuators: int, device: wp.Device) -> ControllerPID.State:
+        return ControllerPID.State(
+            integral=wp.zeros(num_actuators, dtype=wp.float32, device=device),
+        )
+
+    def compute(
+        self,
+        positions: wp.array[float],
+        velocities: wp.array[float],
+        target_pos: wp.array[float],
+        target_vel: wp.array[float],
+        feedforward: wp.array[float] | None,
+        pos_indices: wp.array[wp.uint32],
+        vel_indices: wp.array[wp.uint32],
+        target_pos_indices: wp.array[wp.uint32],
+        target_vel_indices: wp.array[wp.uint32],
+        forces: wp.array[float],
+        state: ControllerPID.State,
+        dt: float,
+        device: wp.Device | None = None,
+    ) -> None:
+        wp.launch(
+            kernel=_pid_force_kernel,
+            dim=len(forces),
+            inputs=[
+                positions,
+                velocities,
+                target_pos,
+                target_vel,
+                feedforward,
+                pos_indices,
+                vel_indices,
+                target_pos_indices,
+                target_vel_indices,
+                self.kp,
+                self.ki,
+                self.kd,
+                self.integral_max,
+                self.constant_force,
+                dt,
+                state.integral,
+            ],
+            outputs=[forces, self._next_integral],
+            device=device,
+        )
+
+    def update_state(
+        self,
+        current_state: ControllerPID.State,
+        next_state: ControllerPID.State,
+    ) -> None:
+        wp.copy(next_state.integral, self._next_integral)
diff --git a/newton/_src/actuators/delay.py b/newton/_src/actuators/delay.py
new file mode 100644
index 0000000000..5104afa21e
--- /dev/null
+++ b/newton/_src/actuators/delay.py
@@ -0,0 +1,337 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 The Newton Developers
+# SPDX-License-Identifier: Apache-2.0
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+from typing import Any
+
+import warp as wp
+
+
+@wp.kernel
+def _delay_buffer_state_kernel(
+    target_pos_global: wp.array[float],
+    target_vel_global: wp.array[float],
+    feedforward_global: wp.array[float],
+    pos_indices: wp.array[wp.uint32],
+    vel_indices: wp.array[wp.uint32],
+    copy_idx: int,
+    write_idx: int,
+    buf_depth: int,
+    current_buffer_pos: wp.array2d[float],
+    current_buffer_vel: wp.array2d[float],
+    current_buffer_act: wp.array2d[float],
+    current_num_pushes: wp.array[int],
+    next_buffer_pos: wp.array2d[float],
+    next_buffer_vel: wp.array2d[float],
+    next_buffer_act: wp.array2d[float],
+    next_num_pushes: wp.array[int],
+):
+    """Update delay circular buffer: copy previous entry, write new entry, increment push count."""
+    i = wp.tid()
+    pos_idx = pos_indices[i]
+    vel_idx = vel_indices[i]
+
+    next_buffer_pos[copy_idx, i] = current_buffer_pos[copy_idx, i]
+    next_buffer_vel[copy_idx, i] = current_buffer_vel[copy_idx, i]
+    next_buffer_act[copy_idx, i] = current_buffer_act[copy_idx, i]
+
+    next_buffer_pos[write_idx, i] = target_pos_global[pos_idx]
+    next_buffer_vel[write_idx, i] = target_vel_global[vel_idx]
+
+    act = float(0.0)
+    if feedforward_global:
+        act = feedforward_global[vel_idx]
+    next_buffer_act[write_idx, i] = act
+
+    next_num_pushes[i] = wp.min(current_num_pushes[i] + 1, buf_depth)
+
+
+@wp.kernel
+def _delay_read_kernel(
+    delays: wp.array[int],
+    num_pushes: wp.array[int],
+    write_idx: int,
+    buf_depth: int,
+    buffer_pos: wp.array2d[float],
+    buffer_vel: wp.array2d[float],
+    buffer_act: wp.array2d[float],
+    current_pos: wp.array[float],
+    current_vel: wp.array[float],
+    current_act: wp.array[float],
+    pos_indices: wp.array[wp.uint32],
+    vel_indices: wp.array[wp.uint32],
+    out_pos: wp.array[float],
+    out_vel: wp.array[float],
+    out_act: wp.array[float],
+):
+    """Read per-DOF delayed targets, falling back to current targets when buffer is empty."""
+    i = wp.tid()
+    n = num_pushes[i]
+    if n == 0:
+        pos_idx = pos_indices[i]
+        vel_idx = vel_indices[i]
+        out_pos[i] = current_pos[pos_idx]
+        out_vel[i] = current_vel[vel_idx]
+        act = float(0.0)
+        if current_act:
+            act = current_act[vel_idx]
+        out_act[i] = act
+    else:
+        lag = wp.min(delays[i] - 1, n - 1)
+        read_idx = (write_idx - lag + buf_depth) % buf_depth
+        out_pos[i] = buffer_pos[read_idx, i]
+        out_vel[i] = buffer_vel[read_idx, i]
+        out_act[i] = buffer_act[read_idx, i]
+
+
+@wp.kernel
+def _delay_masked_reset_kernel(
+    mask: wp.array[wp.bool],
+    rows: int,
+    buf_pos: wp.array2d[float],
+    buf_vel: wp.array2d[float],
+    buf_act: wp.array2d[float],
+    num_pushes: wp.array[int],
+):
+    """Zero all buffer columns and push count where mask is True."""
+    i = wp.tid()
+    if mask[i]:
+        for r in range(rows):
+            buf_pos[r, i] = 0.0
+            buf_vel[r, i] = 0.0
+            buf_act[r, i] = 0.0
+        num_pushes[i] = 0
+
+
+class Delay:
+    """Per-DOF input delay for actuator targets.
+
+    Delays targets using a circular buffer of depth
+    ``max_delay``.  Each DOF has its own lag stored in
+    :attr:`delays` (shape ``(N,)``).  The buffer is sized for the
+    maximum lag across all DOFs so that DOFs with different delays
+    can share the same actuator group.
+
+    The delay always produces output.  When the buffer is empty
+    (e.g. right after reset), the current targets are returned
+    directly.  When underfilled, the lag is clamped to the
+    available history so the oldest entry is returned.
+    """
+
+    @dataclass
+    class State:
+        """Circular buffer state for delayed targets."""
+
+        buffer_pos: wp.array2d[float] | None = None
+        """Delayed target positions [m or rad], shape (buf_depth, N)."""
+        buffer_vel: wp.array2d[float] | None = None
+        """Delayed target velocities [m/s or rad/s], shape (buf_depth, N)."""
+        buffer_act: wp.array2d[float] | None = None
+        """Delayed feedforward inputs [N or N·m], shape (buf_depth, N)."""
+        num_pushes: wp.array[int] | None = None
+        """Per-DOF count of writes since last reset, shape (N,)."""
+        write_idx: int = 0
+        """Current write position in the circular buffer."""
+
+        def reset(self, mask: wp.array[wp.bool] | None = None) -> None:
+            """Reset delay buffer state.
+
+            Args:
+                mask: Boolean mask of length N. ``True`` entries have
+                    their buffer columns zeroed and push count reset.
+                    ``None`` resets all.
+            """
+            if mask is None:
+                self.buffer_pos.zero_()
+                self.buffer_vel.zero_()
+                self.buffer_act.zero_()
+                self.num_pushes.zero_()
+                self.write_idx = self.buffer_pos.shape[0] - 1
+            else:
+                rows = self.buffer_pos.shape[0]
+                n = len(mask)
+                wp.launch(
+                    _delay_masked_reset_kernel,
+                    dim=n,
+                    inputs=[mask, rows, self.buffer_pos, self.buffer_vel, self.buffer_act, self.num_pushes],
+                )
+
+    @classmethod
+    def resolve_arguments(cls, args: dict[str, Any]) -> dict[str, Any]:
+        """Resolve user-provided arguments with defaults.
+
+        Args:
+            args: User-provided arguments.
+
+        Returns:
+            Complete arguments with defaults filled in.
+        """
+        if "delay" not in args:
+            raise ValueError("Delay requires 'delay' argument")
+        delay = args["delay"]
+        if delay < 1:
+            raise ValueError(f"delay must be >= 1, got {delay}")
+        return {"delay": delay}
+
+    def __init__(self, delay: wp.array[int], max_delay: int):
+        """Initialize delay.
+
+        Args:
+            delay: Per-DOF delay values [timesteps], shape ``(N,)``.
+            max_delay: Maximum delay across all DOFs.  Determines the
+                circular-buffer depth.
+
+        Raises:
+            ValueError: If *max_delay* < 1.
+        """
+        if max_delay < 1:
+            raise ValueError(f"max_delay must be >= 1, got {max_delay}")
+        self.buf_depth = max_delay
+        """Circular-buffer depth (equals ``max_delay``)."""
+        self.delays = delay
+        """Per-DOF delay values [timesteps], shape (N,)."""
+        self._num_actuators: int = 0
+        self._device: wp.Device | None = None
+        self._out_pos: wp.array[float] | None = None
+        self._out_vel: wp.array[float] | None = None
+        self._out_act: wp.array[float] | None = None
+
+    def finalize(self, device: wp.Device, num_actuators: int, requires_grad: bool = False) -> None:
+        """Called by :class:`Actuator` after construction.
+
+        Args:
+            device: Warp device to use.
+            num_actuators: Number of actuators (DOFs).
+            requires_grad: Allocate output arrays with gradient support.
+        """
+        self._device = device
+        self._num_actuators = num_actuators
+        self._requires_grad = requires_grad
+        self._out_pos = wp.zeros(num_actuators, dtype=wp.float32, device=self._device, requires_grad=requires_grad)
+        self._out_vel = wp.zeros(num_actuators, dtype=wp.float32, device=self._device, requires_grad=requires_grad)
+        self._out_act = wp.zeros(num_actuators, dtype=wp.float32, device=self._device, requires_grad=requires_grad)
+
+    def state(self, num_actuators: int, device: wp.Device) -> Delay.State:
+        """Create a new delay state with zeroed circular buffers.
+
+        Args:
+            num_actuators: Number of actuators (buffer width N).
+            device: Warp device for buffer allocation.
+
+        Returns:
+            Freshly allocated :class:`Delay.State`.
+        """
+        rg = self._requires_grad
+        return Delay.State(
+            buffer_pos=wp.zeros((self.buf_depth, num_actuators), dtype=wp.float32, device=device, requires_grad=rg),
+            buffer_vel=wp.zeros((self.buf_depth, num_actuators), dtype=wp.float32, device=device, requires_grad=rg),
+            buffer_act=wp.zeros((self.buf_depth, num_actuators), dtype=wp.float32, device=device, requires_grad=rg),
+            num_pushes=wp.zeros(num_actuators, dtype=int, device=device),
+            write_idx=self.buf_depth - 1,
+        )
+
+    def get_delayed_targets(
+        self,
+        target_pos: wp.array[float],
+        target_vel: wp.array[float],
+        feedforward: wp.array[float] | None,
+        pos_indices: wp.array[wp.uint32],
+        vel_indices: wp.array[wp.uint32],
+        current_state: Delay.State,
+    ) -> tuple[wp.array[float], wp.array[float], wp.array[float]]:
+        """Read per-DOF delayed targets from the circular buffer.
+
+        Each DOF reads from its own lag offset stored in :attr:`delays`,
+        clamped to available history (per-DOF ``num_pushes``).  When the
+        buffer is empty, falls back to the current targets; when
+        underfilled, the lag is clamped to the oldest available entry.
+
+        Args:
+            target_pos: Current target positions [m or rad] (global ``joint_target_pos``).
+            target_vel: Current target velocities [m/s or rad/s] (global ``joint_target_vel``).
+            feedforward: Feedforward control input [N or N·m] (global, may be None).
+            pos_indices: Indices into *target_pos* (``joint_q`` layout).
+            vel_indices: Indices into *target_vel* and *feedforward* (``joint_qd`` layout).
+            current_state: Delay state to read from.
+
+        Returns:
+            ``(delayed_pos, delayed_vel, delayed_feedforward)``.  When
+            *feedforward* is ``None``, *delayed_feedforward* is all zeros.
+        """
+        wp.launch(
+            kernel=_delay_read_kernel,
+            dim=self._num_actuators,
+            inputs=[
+                self.delays,
+                current_state.num_pushes,
+                current_state.write_idx,
+                self.buf_depth,
+                current_state.buffer_pos,
+                current_state.buffer_vel,
+                current_state.buffer_act,
+                target_pos,
+                target_vel,
+                feedforward,
+                pos_indices,
+                vel_indices,
+            ],
+            outputs=[self._out_pos, self._out_vel, self._out_act],
+            device=self._device,
+        )
+        return (self._out_pos, self._out_vel, self._out_act)
+
+    def update_state(
+        self,
+        target_pos: wp.array[float],
+        target_vel: wp.array[float],
+        feedforward: wp.array[float] | None,
+        pos_indices: wp.array[wp.uint32],
+        vel_indices: wp.array[wp.uint32],
+        current_state: Delay.State,
+        next_state: Delay.State,
+    ) -> None:
+        """Write current targets into the buffer and advance the write pointer.
+
+        Args:
+            target_pos: Current target positions [m or rad] (global ``joint_target_pos``).
+            target_vel: Current target velocities [m/s or rad/s] (global ``joint_target_vel``).
+            feedforward: Current feedforward input [N or N·m] (global, may be None).
+            pos_indices: Indices into *target_pos* (``joint_q`` layout).
+            vel_indices: Indices into *target_vel* and *feedforward* (``joint_qd`` layout).
+            current_state: Delay state to read from.
+            next_state: Delay state to write into.
+        """
+        if next_state is None:
+            return
+
+        copy_idx = current_state.write_idx
+        write_idx = (current_state.write_idx + 1) % self.buf_depth
+
+        wp.launch(
+            kernel=_delay_buffer_state_kernel,
+            dim=self._num_actuators,
+            inputs=[
+                target_pos,
+                target_vel,
+                feedforward,
+                pos_indices,
+                vel_indices,
+                copy_idx,
+                write_idx,
+                self.buf_depth,
+                current_state.buffer_pos,
+                current_state.buffer_vel,
+                current_state.buffer_act,
+                current_state.num_pushes,
+            ],
+            outputs=[
+                next_state.buffer_pos,
+                next_state.buffer_vel,
+                next_state.buffer_act,
+                next_state.num_pushes,
+            ],
+        )
+
+        next_state.write_idx = write_idx
diff --git a/newton/_src/actuators/usd_parser.py b/newton/_src/actuators/usd_parser.py
new file mode 100644
index 0000000000..dc983cfade
--- /dev/null
+++ b/newton/_src/actuators/usd_parser.py
@@ -0,0 +1,203 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 The Newton Developers
+# SPDX-License-Identifier: Apache-2.0
+
+from __future__ import annotations
+
+import warnings
+from collections.abc import Callable
+from dataclasses import dataclass, field
+from typing import Any
+
+from .clamping import Clamping, ClampingDCMotor, ClampingMaxForce, ClampingPositionBased
+from .controllers import Controller, ControllerNetLSTM, ControllerNetMLP, ControllerPD, ControllerPID
+from .delay import Delay
+
+
+@dataclass
+class SchemaEntry:
+    """Maps an API schema to a component class and its USD→kwarg param names."""
+
+    component_class: type
+    param_map: dict[str, str]
+    is_controller: bool = False
+    validate: Callable[[dict[str, Any]], None] | None = None
+
+
+def _validate_clamp_velocity_based(kwargs: dict[str, Any]) -> None:
+    vel_lim = kwargs.get("velocity_limit")
+    if vel_lim is not None and vel_lim <= 0.0:
+        raise ValueError(
+            f"NewtonClampingDCMotorAPI requires velocity_limit > 0 (division by velocity_limit "
+            f"in torque-speed computation); got {vel_lim}"
+        )
+
+
+# Temporary registry until the actual USD schema is merged.
+SCHEMA_REGISTRY: dict[str, SchemaEntry] = {
+    "NewtonControllerPDAPI": SchemaEntry(
+        component_class=ControllerPD,
+        param_map={"kp": "kp", "kd": "kd", "constForce": "constant_force"},
+        is_controller=True,
+    ),
+    "NewtonControllerPIDAPI": SchemaEntry(
+        component_class=ControllerPID,
+        param_map={"kp": "kp", "ki": "ki", "kd": "kd", "integralMax": "integral_max", "constForce": "constant_force"},
+        is_controller=True,
+    ),
+    "NewtonClampingMaxForceAPI": SchemaEntry(
+        component_class=ClampingMaxForce,
+        param_map={"maxForce": "max_force"},
+    ),
+    "NewtonDelayAPI": SchemaEntry(
+        component_class=Delay,
+        param_map={"delay": "delay"},
+    ),
+    "NewtonClampingDCMotorAPI": SchemaEntry(
+        component_class=ClampingDCMotor,
+        param_map={"saturationEffort": "saturation_effort", "velocityLimit": "velocity_limit", "maxForce": "max_force"},
+        validate=_validate_clamp_velocity_based,
+    ),
+    # Position-based clamping passes the file path directly, mirroring the
+    # networkPath convention used by the neural-network controllers.
+    # The file is read in ClampingPositionBased.finalize().
+    "NewtonClampingPositionBasedAPI": SchemaEntry(
+        component_class=ClampingPositionBased,
+        param_map={"lookupTablePath": "lookup_table_path"},
+    ),
+    # Neural-network controllers
+    # input_order / input_idx (MLP) are intentionally left out of the schema;
+    # they are framework-specific and should be set programmatically.
+    "NewtonControllerNetMLPAPI": SchemaEntry(
+        component_class=ControllerNetMLP,
+        param_map={"networkPath": "network_path"},
+        is_controller=True,
+    ),
+    "NewtonControllerNetLSTMAPI": SchemaEntry(
+        component_class=ControllerNetLSTM,
+        param_map={"networkPath": "network_path"},
+        is_controller=True,
+    ),
+}
+
+
+@dataclass
+class ActuatorParsed:
+    """Result of parsing a USD actuator prim.
+
+    Each detected API schema produces a (class, kwargs) entry.
+    The controller is separated out; everything else goes into
+    component_specs (delay, clamping, etc.).
+    """
+
+    controller_class: type[Controller]
+    controller_kwargs: dict[str, Any] = field(default_factory=dict)
+    component_specs: list[tuple[type[Clamping | Delay], dict[str, Any]]] = field(default_factory=list)
+    target_path: str = ""
+    """Joint target path (USD prim path of the driven joint)."""
+
+
+def get_attribute(prim, name: str, default: Any = None) -> Any:
+    """Get attribute value from a USD prim, returning default if not found."""
+    attr = prim.GetAttribute(name)
+    if not attr or not attr.HasAuthoredValue():
+        return default
+    return attr.Get()
+
+
+def get_relationship_targets(prim, name: str) -> list[str]:
+    """Get relationship target paths from a USD prim."""
+    rel = prim.GetRelationship(name)
+    if not rel:
+        return []
+    return [str(t) for t in rel.GetTargets()]
+
+
+def get_schemas_from_prim(prim) -> list[str]:
+    """Get applied schemas that match the registry.
+
+    Uses prim metadata to find applied schema tokens, since our custom
+    schemas (e.g. ``NewtonControllerPDAPI``) are not registered USD schema types
+    and therefore are not returned by ``GetAppliedSchemas()``.
+    """
+    # GetAppliedSchemas() only returns registered USD schema types.
+    # Our custom schemas live in the apiSchemas metadata token list.
+    # TODO: replace with proper USD schema type checks once the Newton schema is merged.
+    meta = prim.GetMetadata("apiSchemas")
+    if meta is None:
+        return []
+    # SdfTokenListOp: use .GetAddedOrExplicitItems() or iterate directly
+    try:
+        tokens = list(meta.GetAddedOrExplicitItems())
+    except AttributeError:
+        tokens = list(meta)
+    return [s for s in tokens if s in SCHEMA_REGISTRY]
+
+
+def parse_actuator_prim(prim) -> ActuatorParsed | None:
+    """Parse a USD Actuator prim into a composed actuator specification.
+
+    Each detected schema directly maps to a component class with its
+    extracted params. Returns ``None`` if the prim is not a
+    ``NewtonActuator`` or has no target relationship (0 targets is
+    treated as disabled).  If the prim has multiple targets, a warning
+    is emitted and only the first target is used.
+
+    Raises:
+        ValueError: If the prim is a valid actuator but has malformed
+            schemas (e.g. multiple controllers or no controller schema).
+    """
+    if prim.GetTypeName() != "NewtonActuator":
+        return None
+
+    target_paths = get_relationship_targets(prim, "newton:actuator:targets")
+    if not target_paths:
+        return None
+    if len(target_paths) > 1:
+        warnings.warn(
+            f"Actuator prim {prim.GetPath()} has {len(target_paths)} targets; "
+            f"only the first is used, additional targets are ignored",
+            stacklevel=2,
+        )
+        target_paths = target_paths[:1]
+
+    schemas = get_schemas_from_prim(prim)
+    controller_class = None
+    controller_kwargs: dict[str, Any] = {}
+    component_specs: list[tuple[type, dict[str, Any]]] = []
+
+    for schema_name in schemas:
+        entry = SCHEMA_REGISTRY.get(schema_name)
+        if entry is None:
+            continue
+
+        kwargs: dict[str, Any] = {}
+        for usd_name, kwarg_name in entry.param_map.items():
+            value = get_attribute(prim, f"newton:actuator:{usd_name}")
+            if value is not None:
+                kwargs[kwarg_name] = value
+
+        if entry.validate is not None:
+            entry.validate(kwargs)
+
+        if entry.is_controller:
+            if controller_class is not None:
+                raise ValueError(
+                    f"Actuator prim has multiple controllers: "
+                    f"{controller_class.__name__} and {entry.component_class.__name__}"
+                )
+            controller_class = entry.component_class
+            controller_kwargs = kwargs
+        else:
+            component_specs.append((entry.component_class, kwargs))
+
+    if controller_class is None:
+        raise ValueError(
+            f"Actuator prim '{prim.GetPath()}' has no controller schema (matched schemas from metadata: {schemas})"
+        )
+
+    return ActuatorParsed(
+        controller_class=controller_class,
+        controller_kwargs=controller_kwargs,
+        component_specs=component_specs,
+        target_path=target_paths[0],
+    )
diff --git a/newton/_src/sim/builder.py b/newton/_src/sim/builder.py
index f150c4aae3..0389a914b5 100644
--- a/newton/_src/sim/builder.py
+++ b/newton/_src/sim/builder.py
@@ -13,11 +13,14 @@
 from collections import Counter, deque
 from collections.abc import Callable, Iterable, Sequence
 from dataclasses import dataclass, replace
-from typing import TYPE_CHECKING, Any, Literal
+from typing import TYPE_CHECKING, Any, ClassVar, Literal
 
 import numpy as np
 import warp as wp
 
+from ..actuators.clamping.clamping_max_force import ClampingMaxForce
+from ..actuators.controllers.controller_pd import ControllerPD
+from ..actuators.controllers.controller_pid import ControllerPID
 from ..core.types import (
     MAXVAL,
     Axis,
@@ -65,10 +68,16 @@
 )
 from .model import Model
 
+try:
+    from newton_actuators import Actuator as _LegacyActuator
+except ImportError:
+    _LegacyActuator = None
+
 if TYPE_CHECKING:
-    from newton_actuators import Actuator
     from pxr import Usd
 
+    from ..actuators.clamping.base import Clamping
+    from ..actuators.controllers.base import Controller
     from ..geometry.types import TetMesh
 
     UsdStage = Usd.Stage
@@ -193,16 +202,24 @@ def _shape_palette_color(index: int) -> tuple[float, float, float]:
 
     @dataclass
     class ActuatorEntry:
-        """Stores accumulated indices and arguments for one actuator type + scalar params combo.
+        """Stores accumulated specs for one group of compatible composed actuators.
 
-        Each element in input_indices/output_indices represents one actuator.
-        For single-input actuators: [[idx1], [idx2], ...] → flattened to 1D array
-        For multi-input actuators: [[idx1, idx2], [idx3, idx4], ...] → 2D array
+        Each element in ``indices`` is a single DOF index.  The entry key is
+        ``(controller_class, delay is not None, clamping_key, ctrl_shared_key)``
+        where shared params (e.g. ``network_path``, lookup tables) must
+        be identical across all actuators in a group.  Delay values
+        are per-DOF; the buffer is sized to ``max(delay_values) + 1``.
         """
 
-        input_indices: list[list[int]]  # Per-actuator input indices
-        output_indices: list[list[int]]  # Per-actuator output indices
-        args: list[dict[str, Any]]  # Per-actuator array params (scalar params in dict key)
+        controller_class: type  # Controller subclass (e.g. ControllerPD)
+        clamping_classes: tuple  # Tuple of Clamping subclass types (in order)
+        clamping_shared_kwargs: tuple  # Tuple of dicts: shared kwargs per clamping class
+        controller_shared_kwargs: dict  # Shared controller kwargs (e.g. network_path)
+        indices: list[int]  # Per-actuator DOF indices (joint_qd layout)
+        pos_indices: list[int]  # Per-actuator position indices (joint_q layout)
+        controller_args: list[dict[str, Any]]  # Per-actuator controller array params
+        delay_args: list[dict[str, Any]]  # Per-actuator delay params (empty if no delay)
+        clamping_args: list[list[dict[str, Any]]]  # Per-actuator per-clamping array params
 
     @dataclass
     class ShapeConfig:
@@ -1217,9 +1234,9 @@ def __init__(self, up_axis: AxisType = Axis.Z, gravity: float = -9.81):
         """Running counts for custom string frequencies used to size custom attribute arrays."""
 
         # Actuator entries (accumulated during add_actuator calls)
-        # Key is (actuator_class, scalar_params_tuple) to separate instances with different scalar params
-        self.actuator_entries: dict[tuple[type[Actuator], tuple[Any, ...]], ModelBuilder.ActuatorEntry] = {}
-        """Actuator entry groups accumulated from :meth:`add_actuator`, keyed by class and scalar parameters."""
+        # Key is (controller_class, delay is not None, clamping_key, ctrl_shared_key) to group compatible actuators
+        self.actuator_entries: dict[tuple, ModelBuilder.ActuatorEntry] = {}
+        """Actuator entry groups accumulated from :meth:`add_actuator`, keyed by controller class and shared params."""
 
     def add_shape_collision_filter_pair(self, shape_a: int, shape_b: int) -> None:
         """Add a collision filter pair in canonical order.
@@ -1674,69 +1691,209 @@ def apply_indexed_values(
                     f"Custom attribute '{attr_key}' has unsupported frequency {custom_attr.frequency} for joints"
                 )
 
-    def add_actuator(
+    # Maps legacy newton_actuators class names to (newton.actuators controller class name, has_delay).
+    _LEGACY_ACTUATOR_CLASS_MAP: ClassVar[dict[str, tuple[str, bool]]] = {
+        "ActuatorPD": ("ControllerPD", False),
+        "ActuatorPID": ("ControllerPID", False),
+        "ActuatorDelayedPD": ("ControllerPD", True),
+    }
+
+    def _add_actuator_legacy(
         self,
-        actuator_class: type[Actuator],
-        input_indices: list[int],
-        output_indices: list[int] | None = None,
+        actuator_class: type,
+        input_indices: list[int] | None,
+        output_indices: list[int] | None,
         **kwargs: Any,
     ) -> None:
-        """Add an external actuator, independent of any ``UsdPhysics`` joint drives.
+        """Translate a legacy ``newton_actuators``-style call to the new API.
 
-        External actuators (e.g. :class:`newton_actuators.ActuatorPD`) apply
-        forces computed outside the physics engine. Multiple calls with the same
-        *actuator_class* and identical scalar parameters are accumulated into one
-        entry; the actuator instance is created during :meth:`finalize <ModelBuilder.finalize>`.
+        Mirrors the old ``main``-branch signature::
 
-        Args:
-            actuator_class: The actuator class (must derive from
-                :class:`newton_actuators.Actuator`).
-            input_indices: DOF indices this actuator reads from. Length 1 for single-input,
-                length > 1 for multi-input actuators.
-            output_indices: DOF indices for writing output. Defaults to *input_indices*.
-            **kwargs: Actuator parameters (e.g., ``kp``, ``kd``, ``max_force``).
+            add_actuator(actuator_class, input_indices, output_indices=None, **kwargs)
+
+        *input_indices* / *output_indices* may arrive either as explicit
+        arguments or inside *kwargs* (keyword style).  All old per-DOF
+        params (``kp``, ``kd``, ``delay``, ``max_force``, ``gear``, …)
+        are expected in *kwargs*.
         """
-        if output_indices is None:
-            output_indices = input_indices.copy()
+        if input_indices is None:
+            raise TypeError("Legacy add_actuator() requires 'input_indices'")
+
+        # --- Map old class name → new controller class ---
+        class_name = actuator_class.__name__
+        mapping = self._LEGACY_ACTUATOR_CLASS_MAP.get(class_name)
+        if mapping is None:
+            raise TypeError(
+                f"Unknown legacy actuator class '{class_name}'. "
+                f"Supported: {', '.join(self._LEGACY_ACTUATOR_CLASS_MAP)}."
+            )
+
+        ctrl_name, class_has_delay = mapping
+        controller_class: type = {"ControllerPD": ControllerPD, "ControllerPID": ControllerPID}[ctrl_name]
+
+        delay_val: int | None = kwargs.pop("delay", None)
+        if class_has_delay and delay_val is None:
+            raise ValueError(f"{class_name} requires a 'delay' argument")
+
+        max_force_val = kwargs.pop("max_force", None)
+        gear_val = kwargs.pop("gear", None)
+
+        if gear_val is not None and gear_val != 1.0:
+            warnings.warn(
+                f"'gear={gear_val}' is not supported in the new actuator API and will be "
+                f"ignored. Fold the gear ratio into kp/kd/constant_force manually.",
+                DeprecationWarning,
+                stacklevel=3,
+            )
+
+        clamping: list[tuple[type, dict[str, Any]]] | None = None
+        if max_force_val is not None and max_force_val != math.inf:
+            clamping = [(ClampingMaxForce, {"max_force": max_force_val})]
 
-        resolved = actuator_class.resolve_arguments(kwargs)
+        if output_indices is not None and output_indices != input_indices:
+            warnings.warn(
+                "'output_indices' is not supported in the new actuator API and will be "
+                "ignored. Forces are written to the same DOF index by default.",
+                DeprecationWarning,
+                stacklevel=3,
+            )
+
+        for dof_idx in input_indices:
+            self.add_actuator(
+                controller_class,
+                index=dof_idx,
+                clamping=clamping,
+                delay=delay_val,
+                **kwargs,
+            )
+
+    def add_actuator(
+        self,
+        controller_class: type[Controller] | None = None,
+        index: int | None = None,
+        clamping: list[tuple[type[Clamping], dict[str, Any]]] | None = None,
+        delay: int | None = None,
+        pos_index: int | None = None,
+        **kwargs: Any,
+    ) -> None:
+        """Add an external actuator for a single DOF.
+
+        External actuators apply forces computed outside the physics engine.
+        Multiple calls with the same *controller_class*, *clamping*
+        types, and identical shared parameters are accumulated into one
+        :class:`~newton.actuators.Actuator` instance during
+        :meth:`finalize <ModelBuilder.finalize>`.  Different delay
+        values are supported within the same group; the buffer is
+        sized to ``max(delay_values)``.
 
-        # Extract scalar params to form the entry key
-        scalar_param_names = getattr(actuator_class, "SCALAR_PARAMS", set())
-        scalar_key = tuple(sorted((k, resolved[k]) for k in scalar_param_names if k in resolved))
+        .. deprecated::
+            The legacy ``newton_actuators`` signature is still accepted::
+
+                add_actuator(ActuatorPD, input_indices=[dof], kp=50.0)
+                add_actuator(ActuatorPD, [dof], kp=50.0)
+                add_actuator(actuator_class=ActuatorPD, input_indices=[dof], kp=50.0)
+
+            It will be removed in a future release.  Use the new per-DOF
+            signature instead.
+
+        Args:
+            controller_class: Controller class (e.g. :class:`~newton.actuators.ControllerPD`).
+                The deprecated keyword ``actuator_class`` is also accepted.
+            index: DOF index into ``joint_qd``-shaped arrays (velocities,
+                velocity targets, feedforward, forces).
+            clamping: Optional list of ``(ClampingClass, kwargs)`` tuples applied
+                post-controller. E.g. ``[(ClampingMaxForce, {'max_force': 50.0})]``.
+            delay: Optional integer number of timesteps to delay inputs.
+            pos_index: DOF index into ``joint_q``-shaped arrays (positions,
+                position targets). Defaults to *index*. Differs from
+                *index* for floating-base or ball-joint articulations
+                where ``joint_q`` and ``joint_qd`` have different layouts.
+            **kwargs: Per-DOF controller parameters (e.g. ``kp``, ``kd``).
+        """
+        legacy_cls = kwargs.pop("actuator_class", None)
+        if legacy_cls is None and _LegacyActuator is not None and issubclass(controller_class, _LegacyActuator):
+            legacy_cls = controller_class
+
+        if legacy_cls is not None:
+            warnings.warn(
+                "add_actuator() with a newton_actuators class is deprecated. "
+                "Use newton.actuators controller classes instead "
+                "(e.g. ControllerPD, ControllerPID).",
+                DeprecationWarning,
+                stacklevel=2,
+            )
+            input_indices = kwargs.pop("input_indices", index)
+            output_indices = kwargs.pop("output_indices", clamping)
+            if delay is not None:
+                kwargs["delay"] = delay
+            self._add_actuator_legacy(legacy_cls, input_indices, output_indices, **kwargs)
+            return
+
+        if controller_class is None:
+            raise TypeError("add_actuator() requires 'controller_class'")
+
+        if index is None:
+            raise TypeError("add_actuator() missing required argument: 'index'")
+
+        clamping = clamping or []
+
+        # --- Resolve controller kwargs and separate shared from per-DOF ---
+        resolved_ctrl = controller_class.resolve_arguments(kwargs)
+        ctrl_shared_names = getattr(controller_class, "SHARED_PARAMS", set())
+        ctrl_shared = {k: resolved_ctrl[k] for k in ctrl_shared_names if k in resolved_ctrl}
+        ctrl_array_params = {k: v for k, v in resolved_ctrl.items() if k not in ctrl_shared_names}
+
+        # --- Resolve per-clamping kwargs and separate shared from per-DOF ---
+        clamping_classes = tuple(cc for cc, _ in clamping)
+        clamping_shared_list = []
+        clamping_array_params_list = []
+        for comp_class, comp_kwargs in clamping:
+            resolved_comp = comp_class.resolve_arguments(comp_kwargs)
+            comp_shared_names = getattr(comp_class, "SHARED_PARAMS", set())
+            comp_shared = {k: resolved_comp[k] for k in comp_shared_names if k in resolved_comp}
+            comp_array = {k: v for k, v in resolved_comp.items() if k not in comp_shared_names}
+            clamping_shared_list.append(comp_shared)
+            clamping_array_params_list.append(comp_array)
+
+        clamping_shared_kwargs = tuple(clamping_shared_list)
+
+        # --- Build entry key: identifies a group of compatible actuators ---
+        # Groups differ when controller class, presence of delay, clamping
+        # types/shared-params, or controller shared params differ.
+        # Delay values are per-DOF; the buffer is sized to max(delays).
+        def _make_hashable(v: Any) -> Any:
+            if isinstance(v, list):
+                return tuple(v)
+            return v
+
+        ctrl_shared_key = tuple(sorted((k, _make_hashable(v)) for k, v in ctrl_shared.items()))
+        clamping_key = tuple(
+            (cc, tuple(sorted((k, _make_hashable(v)) for k, v in shared.items())))
+            for cc, shared in zip(clamping_classes, clamping_shared_list, strict=True)
+        )
+        entry_key = (controller_class, delay is not None, clamping_key, ctrl_shared_key)
 
-        # Key is (class, scalar_params) so different scalar values create separate entries
-        entry_key = (actuator_class, scalar_key)
         entry = self.actuator_entries.setdefault(
             entry_key,
-            ModelBuilder.ActuatorEntry(input_indices=[], output_indices=[], args=[]),
+            ModelBuilder.ActuatorEntry(
+                controller_class=controller_class,
+                clamping_classes=clamping_classes,
+                clamping_shared_kwargs=clamping_shared_kwargs,
+                controller_shared_kwargs=ctrl_shared,
+                indices=[],
+                pos_indices=[],
+                controller_args=[],
+                delay_args=[],
+                clamping_args=[],
+            ),
         )
 
-        # Filter out scalar params from args (they're already in the key)
-        array_params = {k: v for k, v in resolved.items() if k not in scalar_param_names}
-
-        # Validate dimension consistency before appending
-        if entry.input_indices:
-            expected_input_dim = len(entry.input_indices[0])
-            if len(input_indices) != expected_input_dim:
-                raise ValueError(
-                    f"Input indices dimension mismatch for {actuator_class.__name__}: "
-                    f"expected {expected_input_dim}, got {len(input_indices)}. "
-                    f"All actuators of the same type must have the same number of inputs."
-                )
-        if entry.output_indices:
-            expected_output_dim = len(entry.output_indices[0])
-            if len(output_indices) != expected_output_dim:
-                raise ValueError(
-                    f"Output indices dimension mismatch for {actuator_class.__name__}: "
-                    f"expected {expected_output_dim}, got {len(output_indices)}. "
-                    f"All actuators of the same type must have the same number of outputs."
-                )
-
-        # Each call adds one actuator with its input/output indices
-        entry.input_indices.append(input_indices)
-        entry.output_indices.append(output_indices)
-        entry.args.append(array_params)
+        entry.indices.append(index)
+        entry.pos_indices.append(pos_index if pos_index is not None else index)
+        entry.controller_args.append(ctrl_array_params)
+        if delay is not None:
+            entry.delay_args.append({"delay": delay})
+        entry.clamping_args.append(clamping_array_params_list)
 
     def _stack_args_to_arrays(
         self,
@@ -1760,44 +1917,34 @@ def _stack_args_to_arrays(
         result = {}
         for key in args_list[0].keys():
             values = [args[key] for args in args_list]
-            result[key] = wp.array(values, dtype=wp.float32, device=device, requires_grad=requires_grad)
+            for v in values:
+                if not isinstance(v, (int, float)):
+                    raise TypeError(
+                        f"add_actuator expects scalar per-DOF params, but "
+                        f"parameter '{key}' got {type(v).__name__}; pass one "
+                        f"scalar per add_actuator call"
+                    )
+            if all(isinstance(v, int) for v in values):
+                result[key] = wp.array(values, dtype=wp.int32, device=device)
+            else:
+                result[key] = wp.array(values, dtype=wp.float32, device=device, requires_grad=requires_grad)
 
         return result
 
     @staticmethod
-    def _build_index_array(indices: list[list[int]], device: Devicelike) -> wp.array:
-        """Build a warp array from nested index lists.
-
-        If all inner lists have length 1, creates a 1D array (N,).
-        Otherwise, creates a 2D array (N, M) where M is the inner list length.
+    def _build_index_array(indices: list[int], device: Devicelike) -> wp.array:
+        """Build a 1-D warp index array from a flat list of DOF indices.
 
         Args:
-            indices: Nested list of indices, one inner list per actuator.
+            indices: Flat list of DOF indices, one per actuator.
             device: Device for the warp array.
 
         Returns:
-            Array with shape ``(N,)`` or ``(N, M)``.
-
-        Raises:
-            ValueError: If inner lists have inconsistent lengths (for 2D case).
+            Array with shape ``(N,)``.
         """
         if not indices:
             return wp.array([], dtype=wp.uint32, device=device)
-
-        inner_lengths = [len(idx_list) for idx_list in indices]
-        max_len = max(inner_lengths)
-
-        if max_len == 1:
-            # All single-input: flatten to 1D
-            flat = [idx_list[0] for idx_list in indices]
-            return wp.array(flat, dtype=wp.uint32, device=device)
-        else:
-            # Multi-input: create 2D array
-            if not all(length == max_len for length in inner_lengths):
-                raise ValueError(
-                    f"All actuators must have the same number of inputs for 2D indexing. Got lengths: {inner_lengths}"
-                )
-            return wp.array(indices, dtype=wp.uint32, device=device)
+        return wp.array(indices, dtype=wp.uint32, device=device)
 
     @property
     def default_site_cfg(self) -> ShapeConfig:
@@ -3266,14 +3413,25 @@ def transform_value(v, offset=value_offset, replace_with_world=use_current_world
         for entry_key, sub_entry in builder.actuator_entries.items():
             entry = self.actuator_entries.setdefault(
                 entry_key,
-                ModelBuilder.ActuatorEntry(input_indices=[], output_indices=[], args=[]),
+                ModelBuilder.ActuatorEntry(
+                    controller_class=sub_entry.controller_class,
+                    clamping_classes=sub_entry.clamping_classes,
+                    clamping_shared_kwargs=sub_entry.clamping_shared_kwargs,
+                    controller_shared_kwargs=sub_entry.controller_shared_kwargs,
+                    indices=[],
+                    pos_indices=[],
+                    controller_args=[],
+                    delay_args=[],
+                    clamping_args=[],
+                ),
             )
-            # Offset indices by start_joint_dof_idx (each actuator's indices are a list)
-            for idx_list in sub_entry.input_indices:
-                entry.input_indices.append([idx + start_joint_dof_idx for idx in idx_list])
-            for idx_list in sub_entry.output_indices:
-                entry.output_indices.append([idx + start_joint_dof_idx for idx in idx_list])
-            entry.args.extend(sub_entry.args)
+            for idx in sub_entry.indices:
+                entry.indices.append(idx + start_joint_dof_idx)
+            for idx in sub_entry.pos_indices:
+                entry.pos_indices.append(idx + start_joint_coord_idx)
+            entry.controller_args.extend(sub_entry.controller_args)
+            entry.delay_args.extend(sub_entry.delay_args)
+            entry.clamping_args.extend(sub_entry.clamping_args)
 
     @staticmethod
     def _coerce_mat33(value: Any) -> wp.mat33:
@@ -10297,18 +10455,49 @@ def _to_wp_array(data, dtype, requires_grad):
             )
 
             # Create actuators from accumulated entries
+            from ..actuators.actuator import Actuator  # noqa: PLC0415
+            from ..actuators.delay import Delay  # noqa: PLC0415
+
             m.actuators = []
-            for (actuator_class, scalar_key), entry in self.actuator_entries.items():
-                input_indices = self._build_index_array(entry.input_indices, device)
-                output_indices = self._build_index_array(entry.output_indices, device)
-                param_arrays = self._stack_args_to_arrays(entry.args, device=device, requires_grad=requires_grad)
-                scalar_params = dict(scalar_key)
-                actuator = actuator_class(
-                    input_indices=input_indices,
-                    output_indices=output_indices,
-                    **param_arrays,
-                    **scalar_params,
+            for entry in self.actuator_entries.values():
+                indices = self._build_index_array(entry.indices, device)
+
+                pos_indices_arg = None
+                if entry.pos_indices != entry.indices:
+                    pos_indices_arg = self._build_index_array(entry.pos_indices, device)
+
+                # Build controller from stacked per-DOF arrays + shared kwargs
+                ctrl_arrays = self._stack_args_to_arrays(
+                    entry.controller_args, device=device, requires_grad=requires_grad
                 )
+                controller = entry.controller_class(**ctrl_arrays, **entry.controller_shared_kwargs)
+
+                delay_obj = None
+                if entry.delay_args:
+                    delay_arrays = self._stack_args_to_arrays(entry.delay_args, device=device)
+                    max_delay = max(d["delay"] for d in entry.delay_args)
+                    delay_obj = Delay(**delay_arrays, max_delay=max_delay)
+
+                # Build clamping objects from per-DOF arrays + shared kwargs
+                clamping_objs = []
+                for i, (comp_class, shared_kw) in enumerate(
+                    zip(entry.clamping_classes, entry.clamping_shared_kwargs, strict=True)
+                ):
+                    comp_args_per_actuator = [per_act[i] for per_act in entry.clamping_args]
+                    comp_arrays = self._stack_args_to_arrays(
+                        comp_args_per_actuator, device=device, requires_grad=requires_grad
+                    )
+                    clamping_objs.append(comp_class(**comp_arrays, **shared_kw))
+
+                actuator = Actuator(
+                    indices=indices,
+                    controller=controller,
+                    delay=delay_obj,
+                    clamping=clamping_objs if clamping_objs else None,
+                    pos_indices=pos_indices_arg,
+                    requires_grad=requires_grad,
+                )
+
                 m.actuators.append(actuator)
 
             # Add custom attributes onto the model (with lazy evaluation)
diff --git a/newton/_src/sim/model.py b/newton/_src/sim/model.py
index a7539c66fb..f03685a421 100644
--- a/newton/_src/sim/model.py
+++ b/newton/_src/sim/model.py
@@ -17,8 +17,7 @@
 from .state import State
 
 if TYPE_CHECKING:
-    from newton_actuators import Actuator
-
+    from ..actuators.actuator import Actuator
     from ..utils.heightfield import HeightfieldData
     from .collide import CollisionPipeline
 
diff --git a/newton/_src/solvers/kamino/README.md b/newton/_src/solvers/kamino/README.md
index e8cf18754d..9815daca01 100644
--- a/newton/_src/solvers/kamino/README.md
+++ b/newton/_src/solvers/kamino/README.md
@@ -139,11 +139,6 @@ cd newton-usd-schemas
 pip install -e .[dev,docs,notebook]
 ```
 ```bash
-git clone git@github.com:newton-physics/newton-actuators.git
-cd newton-actuators
-pip install -e .[dev,docs,notebook]
-```
-```bash
 git clone git@github.com:newton-physics/newton.git
 cd newton
 pip install -e .[dev,docs,notebook]
diff --git a/newton/_src/utils/import_usd.py b/newton/_src/utils/import_usd.py
index 2e0eae9355..4d401a7c29 100644
--- a/newton/_src/utils/import_usd.py
+++ b/newton/_src/utils/import_usd.py
@@ -2752,35 +2752,47 @@ def _get_collision_mass_information(collider_prim: Usd.Prim):
         )
 
     # Parse Newton actuator prims from the USD stage.
-    try:
-        from newton_actuators import parse_actuator_prim  # noqa: PLC0415
-    except ImportError:
-        parse_actuator_prim = None
+    from ..actuators.delay import Delay  # noqa: PLC0415
+    from ..actuators.usd_parser import parse_actuator_prim  # noqa: PLC0415
 
     actuator_count = 0
-    if parse_actuator_prim is not None:
-        path_to_dof = {
-            path: builder.joint_qd_start[idx]
-            for path, idx in path_joint_map.items()
-            if idx < len(builder.joint_qd_start)
-        }
-        for prim in Usd.PrimRange(stage.GetPrimAtPath(root_path)):
-            parsed = parse_actuator_prim(prim)
-            if parsed is None:
-                continue
-            dof_indices = [path_to_dof[p] for p in parsed.target_paths if p in path_to_dof]
-            if dof_indices:
-                builder.add_actuator(parsed.actuator_class, input_indices=dof_indices, **parsed.kwargs)
-                actuator_count += 1
-    else:
-        # TODO: Replace this string-based type name check with a proper schema query
-        # once the Newton actuator USD schema is merged
-        for prim in Usd.PrimRange(stage.GetPrimAtPath(root_path)):
-            if prim.GetTypeName() == "Actuator":
-                raise ImportError(
-                    f"USD stage contains actuator prims (e.g. {prim.GetPath()}) but newton-actuators is not installed. "
-                    "Install with: pip install newton[sim]"
-                )
+    path_to_dof = {
+        path: builder.joint_qd_start[idx] for path, idx in path_joint_map.items() if idx < len(builder.joint_qd_start)
+    }
+    path_to_coord = {
+        path: builder.joint_q_start[idx] for path, idx in path_joint_map.items() if idx < len(builder.joint_q_start)
+    }
+    for prim in Usd.PrimRange(stage.GetPrimAtPath(root_path)):
+        parsed = parse_actuator_prim(prim)
+        if parsed is None:
+            continue
+        target_path = parsed.target_path
+        if target_path not in path_to_dof:
+            raise ValueError(
+                f"Actuator prim {prim.GetPath()} targets '{target_path}' which does not resolve to a known joint DOF"
+            )
+        dof_index = path_to_dof[target_path]
+        coord_index = path_to_coord.get(target_path)
+        pos_index = coord_index if coord_index is not None and coord_index != dof_index else None
+
+        # Separate Delay from Clamping components in the parsed spec
+        delay_val = None
+        clamping_specs = []
+        for comp_class, comp_kwargs in parsed.component_specs:
+            if comp_class is Delay:
+                delay_val = comp_kwargs.get("delay")
+            else:
+                clamping_specs.append((comp_class, comp_kwargs))
+
+        builder.add_actuator(
+            parsed.controller_class,
+            index=dof_index,
+            clamping=clamping_specs if clamping_specs else None,
+            delay=delay_val,
+            pos_index=pos_index,
+            **parsed.controller_kwargs,
+        )
+        actuator_count += 1
     if verbose and actuator_count > 0:
         print(f"Added {actuator_count} actuator(s) from USD")
 
diff --git a/newton/_src/utils/selection.py b/newton/_src/utils/selection.py
index 7bc9716acf..0189a3bfb1 100644
--- a/newton/_src/utils/selection.py
+++ b/newton/_src/utils/selection.py
@@ -12,7 +12,7 @@
 from ..sim import Control, JointType, Model, State, eval_fk, eval_jacobian, eval_mass_matrix
 
 if TYPE_CHECKING:
-    from newton_actuators import Actuator
+    from ..actuators.actuator import Actuator
 
 AttributeFrequency = Model.AttributeFrequency
 
@@ -278,12 +278,12 @@ def build_actuator_dof_mapping_indices_kernel(
 
 
 @wp.kernel
-def gather_actuator_by_indices_kernel(
-    src: wp.array[float],
+def _gather_1d_kernel(
+    src: Any,
     indices: wp.array[int],
-    dst: wp.array[float],
+    dst: Any,
 ):
-    """Gather values from src at specified indices into dst. Index -1 means skip (leave dst unchanged)."""
+    """Gather ``dst[tid] = src[indices[tid]]``. Index -1 means skip (leave dst unchanged)."""
     tid = wp.tid()
     idx = indices[tid]
     if idx >= 0:
@@ -291,26 +291,22 @@ def gather_actuator_by_indices_kernel(
 
 
 @wp.kernel
-def scatter_actuator_with_mask_kernel(
-    values: wp.array2d[float],
+def _scatter_masked_2d_kernel(
+    values: Any,
     mapping: wp.array[int],
     mask: wp.array[bool],
-    dofs_per_world: int,
-    dst: wp.array[float],
+    cols: int,
+    dst: Any,
 ):
-    """Scatter actuator values with articulation mask support.
+    """Scatter ``dst[mapping[row * cols + col]] = values[row, col]`` where ``mask[row]`` is true.
 
-    values: shape (world_count, dofs_per_world)
-    mapping: flat array mapping DOF positions to actuator indices (-1 = not actuated)
-    mask: per-world mask, shape (world_count,)
-    dst: flat actuator parameter array
+    Mapping entries of -1 are skipped.
     """
-    world_idx, local_idx = wp.tid()
-    if mask[world_idx]:
-        flat_idx = world_idx * dofs_per_world + local_idx
-        actuator_idx = mapping[flat_idx]
-        if actuator_idx >= 0:
-            dst[actuator_idx] = values[world_idx, local_idx]
+    row, col = wp.tid()
+    if mask[row]:
+        dst_idx = mapping[row * cols + col]
+        if dst_idx >= 0:
+            dst[dst_idx] = values[row, col]
 
 
 # NOTE: Python slice objects are not hashable in Python < 3.12, so we use this instead.
@@ -1643,15 +1639,13 @@ def _get_actuator_dof_mapping(self, actuator: "Actuator") -> wp.array:
         Build mapping from view DOF positions to actuator parameter indices.
 
         Note:
-            For selection we assume that input_indices is 1D (one input per actuator),
-            not the general 2D case (multiple inputs per actuator) which is supported
-            by the library.
+            Assumes SISO actuators (one DOF per actuator).
 
         Returns array of shape (world_count * dofs_per_world,) where each element is:
         - actuator parameter index if that DOF is actuated
         - -1 if that DOF is not actuated by this actuator
         """
-        num_actuators = actuator.input_indices.shape[0]
+        num_actuators = actuator.indices.shape[0]
         actuators_per_world = num_actuators // self.world_count
 
         dof_layout = self.frequency_layouts[AttributeFrequency.JOINT_DOF]
@@ -1668,7 +1662,7 @@ def _get_actuator_dof_mapping(self, actuator: "Actuator") -> wp.array:
                 build_actuator_dof_mapping_slice_kernel,
                 dim=actuators_per_world,
                 inputs=[
-                    actuator.input_indices,
+                    actuator.indices,
                     actuators_per_world,
                     dof_layout.offset,
                     dof_layout.slice.start,
@@ -1687,7 +1681,7 @@ def _get_actuator_dof_mapping(self, actuator: "Actuator") -> wp.array:
                 build_actuator_dof_mapping_indices_kernel,
                 dim=actuators_per_world,
                 inputs=[
-                    actuator.input_indices,
+                    actuator.indices,
                     dof_layout.indices,
                     dof_layout.offset,
                     dof_layout.stride_within_worlds,
@@ -1703,57 +1697,59 @@ def _get_actuator_dof_mapping(self, actuator: "Actuator") -> wp.array:
 
         return mapping
 
-    def _get_actuator_attribute_array(self, actuator: "Actuator", name: str) -> wp.array:
-        """Get actuator parameter array shaped (world_count, dofs_per_world), zeros for non-actuated DOFs."""
+    def get_actuator_parameter(self, actuator: "Actuator", component: Any, name: str) -> wp.array:
+        """Get actuator parameter values for actuators corresponding to this view's DOFs.
+
+        Args:
+            actuator: Actuator instance (used for DOF index mapping).
+            component: The component that owns the parameter — a
+                :class:`~newton.actuators.Controller`,
+                :class:`~newton.actuators.Clamping`, or
+                :class:`~newton.actuators.Delay` instance.
+            name: Attribute name on *component* (e.g. ``"kp"``, ``"max_force"``,
+                ``"delays"``).
+
+        Returns:
+            Parameter values shaped ``(world_count, dofs_per_world)``.
+        """
         mapping = self._get_actuator_dof_mapping(actuator)
         if len(mapping) == 0:
             return wp.empty((self.world_count, 0), dtype=float, device=self.device)
 
-        src = getattr(actuator, name)
+        src = getattr(component, name)
         dofs_per_world = len(mapping) // self.world_count
 
         dst = wp.zeros(len(mapping), dtype=src.dtype, device=self.device)
         wp.launch(
-            gather_actuator_by_indices_kernel,
+            _gather_1d_kernel,
             dim=len(mapping),
             inputs=[src, mapping],
             outputs=[dst],
             device=self.device,
         )
-
-        batched_shape = (self.world_count, dofs_per_world, *src.shape[1:])
-        return dst.reshape(batched_shape)
-
-    def get_actuator_parameter(self, actuator: "Actuator", name: str) -> wp.array:
-        """
-        Get actuator parameter values for actuators corresponding to this view's DOFs.
-
-        Args:
-            actuator: An actuator instance with input_indices and parameter arrays.
-            name (str): Parameter name (e.g., 'kp', 'kd', 'max_force', 'gear', 'constant_force').
-
-        Returns:
-            wp.array: Parameter values shaped (world_count, dofs_per_world).
-        """
-        return self._get_actuator_attribute_array(actuator, name)
+        return dst.reshape((self.world_count, dofs_per_world))
 
     def set_actuator_parameter(
-        self, actuator: "Actuator", name: str, values: wp.array, mask: wp.array | None = None
+        self, actuator: "Actuator", component: Any, name: str, values: wp.array, mask: wp.array | None = None
     ) -> None:
-        """
-        Set actuator parameter values for actuators corresponding to this view's DOFs.
+        """Set actuator parameter values for actuators corresponding to this view's DOFs.
 
         Args:
-            actuator: An actuator instance with input_indices and parameter arrays.
-            name (str): Parameter name (e.g., 'kp', 'kd', 'max_force', 'gear', 'constant_force').
-            values: New parameter values shaped (world_count, dofs_per_world). Non-actuated DOFs are ignored.
-            mask (array, optional): Per-world mask (world_count,). Only masked worlds are updated.
+            actuator: Actuator instance (used for DOF index mapping).
+            component: The component that owns the parameter — a
+                :class:`~newton.actuators.Controller`,
+                :class:`~newton.actuators.Clamping`, or
+                :class:`~newton.actuators.Delay` instance.
+            name: Attribute name on *component* (e.g. ``"kp"``, ``"max_force"``,
+                ``"delays"``).
+            values: New parameter values shaped ``(world_count, dofs_per_world)``.
+            mask: Per-world mask ``(world_count,)``. Only masked worlds are updated.
         """
         mapping = self._get_actuator_dof_mapping(actuator)
         if len(mapping) == 0:
             return
 
-        dst = getattr(actuator, name)
+        dst = getattr(component, name)
         dofs_per_world = len(mapping) // self.world_count
         expected_shape = (self.world_count, dofs_per_world, *dst.shape[1:])
 
@@ -1772,7 +1768,7 @@ def set_actuator_parameter(
                 raise ValueError(f"Expected mask shape ({self.world_count},), got {mask.shape}")
 
         wp.launch(
-            scatter_actuator_with_mask_kernel,
+            _scatter_masked_2d_kernel,
             dim=(self.world_count, dofs_per_world),
             inputs=[values, mapping, mask, dofs_per_world],
             outputs=[dst],
diff --git a/newton/actuators.py b/newton/actuators.py
new file mode 100644
index 0000000000..1d1f9951a1
--- /dev/null
+++ b/newton/actuators.py
@@ -0,0 +1,43 @@
+# SPDX-FileCopyrightText: Copyright (c) 2025 The Newton Developers
+# SPDX-License-Identifier: Apache-2.0
+
+"""GPU-accelerated actuator models for physics simulations.
+
+This module provides a modular library of actuator components — controllers,
+clamping, and delay — that compute joint forces from simulation state and
+control targets. Components are composed into an :class:`Actuator` instance
+and registered with :meth:`~newton.ModelBuilder.add_actuator` during model
+construction.
+"""
+
+from ._src.actuators import (
+    Actuator,
+    ActuatorParsed,
+    Clamping,
+    ClampingDCMotor,
+    ClampingMaxForce,
+    ClampingPositionBased,
+    Controller,
+    ControllerNetLSTM,
+    ControllerNetMLP,
+    ControllerPD,
+    ControllerPID,
+    Delay,
+    parse_actuator_prim,
+)
+
+__all__ = [
+    "Actuator",
+    "ActuatorParsed",
+    "Clamping",
+    "ClampingDCMotor",
+    "ClampingMaxForce",
+    "ClampingPositionBased",
+    "Controller",
+    "ControllerNetLSTM",
+    "ControllerNetMLP",
+    "ControllerPD",
+    "ControllerPID",
+    "Delay",
+    "parse_actuator_prim",
+]
diff --git a/newton/tests/assets/actuator_test.usda b/newton/tests/assets/actuator_test.usda
index 86d4e7cb64..3eefff3d26 100644
--- a/newton/tests/assets/actuator_test.usda
+++ b/newton/tests/assets/actuator_test.usda
@@ -92,23 +92,23 @@ def Xform "World"
             uniform token physics:axis = "Y"
         }
 
-        # Actuator for Joint 1 (PD Controller)
-        def Actuator "Joint1Actuator" (
-            prepend apiSchemas = ["PDControllerAPI"]
+        # Actuator for Joint 1 (PD Controller + MaxForce clamping)
+        def NewtonActuator "Joint1Actuator" (
+            prepend apiSchemas = ["NewtonControllerPDAPI", "NewtonClampingMaxForceAPI"]
         )
         {
-            rel newton:actuator:target = </World/Robot/Joint1>
+            rel newton:actuator:targets = [</World/Robot/Joint1>]
             float newton:actuator:kp = 100.0
             float newton:actuator:kd = 10.0
             float newton:actuator:maxForce = 50.0
         }
 
         # Actuator for Joint 2 (PD Controller with Delay)
-        def Actuator "Joint2Actuator" (
-            prepend apiSchemas = ["PDControllerAPI", "DelayAPI"]
+        def NewtonActuator "Joint2Actuator" (
+            prepend apiSchemas = ["NewtonControllerPDAPI", "NewtonDelayAPI"]
         )
         {
-            rel newton:actuator:target = </World/Robot/Joint2>
+            rel newton:actuator:targets = [</World/Robot/Joint2>]
             float newton:actuator:kp = 200.0
             float newton:actuator:kd = 20.0
             int newton:actuator:delay = 5
diff --git a/newton/tests/test_actuators.py b/newton/tests/test_actuators.py
index 8e25964173..05f72bdb14 100644
--- a/newton/tests/test_actuators.py
+++ b/newton/tests/test_actuators.py
@@ -1,18 +1,31 @@
 # SPDX-FileCopyrightText: Copyright (c) 2026 The Newton Developers
 # SPDX-License-Identifier: Apache-2.0
 
-"""Tests for actuator integration with ModelBuilder."""
+"""Tests for Newton actuators."""
 
+import importlib.util
 import math
 import os
 import unittest
+import warnings
 
 import numpy as np
 import warp as wp
-from newton_actuators import ActuatorDelayedPD, ActuatorPD, ActuatorPID, parse_actuator_prim
 
 import newton
 from newton._src.utils.import_usd import parse_usd
+from newton.actuators import (
+    ActuatorParsed,
+    ClampingDCMotor,
+    ClampingMaxForce,
+    ClampingPositionBased,
+    ControllerNetLSTM,
+    ControllerNetMLP,
+    ControllerPD,
+    ControllerPID,
+    Delay,
+    parse_actuator_prim,
+)
 from newton.selection import ArticulationView
 
 try:
@@ -22,252 +35,799 @@
 except ImportError:
     HAS_USD = False
 
+try:
+    from newton_actuators import ActuatorDelayedPD, ActuatorPD, ActuatorPID
 
-class TestActuatorBuilder(unittest.TestCase):
-    """Tests for ModelBuilder.add_actuator - functionality, multi-world, and scalar params."""
+    _HAS_LEGACY_ACTUATORS = True
+except ImportError:
+    _HAS_LEGACY_ACTUATORS = False
+
+_HAS_TORCH = importlib.util.find_spec("torch") is not None
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+
+def _write_dof_values(model, array, dof_indices, values):
+    """Write scalar values into specific DOF positions of a Warp array."""
+    arr_np = array.numpy()
+    for dof, val in zip(dof_indices, values, strict=True):
+        arr_np[dof] = val
+    wp.copy(array, wp.array(arr_np, dtype=float, device=model.device))
+
+
+# ---------------------------------------------------------------------------
+# 1. Controllers
+# ---------------------------------------------------------------------------
+
+
+class TestControllerPD(unittest.TestCase):
+    """PD controller: f = constant + act + kp*(target_pos - q) + kd*(target_vel - v)."""
+
+    def test_compute(self):
+        """Construct controller directly and call compute() with all terms."""
+        n = 2
+        kp_vals = [100.0, 200.0]
+        kd_vals = [10.0, 20.0]
+        const_vals = [5.0, -3.0]
+        q = [0.3, -0.5]
+        qd = [1.0, -2.0]
+        tgt_pos = [1.0, 0.5]
+        tgt_vel = [0.0, 1.0]
+        ff = [3.0, -1.0]
+
+        def _f(vals):
+            return wp.array(vals, dtype=wp.float32)
+
+        indices = wp.array(list(range(n)), dtype=wp.uint32)
+        ctrl = ControllerPD(kp=_f(kp_vals), kd=_f(kd_vals), constant_force=_f(const_vals))
+        forces = wp.zeros(n, dtype=wp.float32)
+
+        ctrl.compute(
+            positions=_f(q),
+            velocities=_f(qd),
+            target_pos=_f(tgt_pos),
+            target_vel=_f(tgt_vel),
+            feedforward=_f(ff),
+            pos_indices=indices,
+            vel_indices=indices,
+            target_pos_indices=indices,
+            target_vel_indices=indices,
+            forces=forces,
+            state=None,
+            dt=0.01,
+        )
 
-    def test_accumulation_and_parameters(self):
-        """Test actuator accumulation, parameters, defaults, and input/output indices."""
-        builder = newton.ModelBuilder()
+        result = forces.numpy()
+        for i in range(n):
+            expected = const_vals[i] + ff[i] + kp_vals[i] * (tgt_pos[i] - q[i]) + kd_vals[i] * (tgt_vel[i] - qd[i])
+            self.assertAlmostEqual(result[i], expected, places=4, msg=f"DOF {i}")
 
-        bodies = [builder.add_body() for _ in range(3)]
-        joints = []
-        for i, body in enumerate(bodies):
-            parent = -1 if i == 0 else bodies[i - 1]
-            joints.append(builder.add_joint_revolute(parent=parent, child=body, axis=newton.Axis.Z))
-        builder.add_articulation(joints)
 
-        dofs = [builder.joint_qd_start[j] for j in joints]
+class TestControllerPID(unittest.TestCase):
+    """PID controller: f = const + act + kp*e + ki*integral + kd*de."""
 
-        # Add actuators - should accumulate into one
-        builder.add_actuator(ActuatorPD, input_indices=[dofs[0]], kp=50.0, gear=2.5, constant_force=1.0)
-        builder.add_actuator(ActuatorPD, input_indices=[dofs[1]], kp=100.0, kd=10.0)
-        builder.add_actuator(ActuatorPD, input_indices=[dofs[1]], output_indices=[dofs[2]], kp=150.0, max_force=50.0)
+    def test_compute(self):
+        """Construct controller directly and call compute() over multiple steps."""
+        kp, ki, kd, const = 50.0, 10.0, 5.0, 2.0
+        dt = 0.01
+        q, qd = [0.0], [0.0]
+        tgt_pos, tgt_vel = [1.0], [0.0]
+        pos_error = tgt_pos[0] - q[0]
+        vel_error = tgt_vel[0] - qd[0]
+        device = wp.get_device()
+
+        def _f(vals):
+            return wp.array(vals, dtype=wp.float32, device=device)
+
+        indices = wp.array([0], dtype=wp.uint32, device=device)
+        ctrl = ControllerPID(
+            kp=_f([kp]),
+            ki=_f([ki]),
+            kd=_f([kd]),
+            integral_max=_f([math.inf]),
+            constant_force=_f([const]),
+        )
+        ctrl.finalize(device, 1)
 
-        model = builder.finalize()
+        state_0 = ctrl.state(1, device)
+        state_1 = ctrl.state(1, device)
 
-        self.assertEqual(len(model.actuators), 1)
-        act = model.actuators[0]
-        self.assertEqual(act.num_actuators, 3)
-
-        np.testing.assert_array_equal(act.input_indices.numpy(), [dofs[0], dofs[1], dofs[1]])
-        np.testing.assert_array_equal(act.output_indices.numpy(), [dofs[0], dofs[1], dofs[2]])
-        np.testing.assert_array_almost_equal(act.kp.numpy(), [50.0, 100.0, 150.0])
-        np.testing.assert_array_almost_equal(act.kd.numpy(), [0.0, 10.0, 0.0])
-        self.assertAlmostEqual(act.gear.numpy()[0], 2.5)
-        self.assertAlmostEqual(act.constant_force.numpy()[0], 1.0)
-        self.assertAlmostEqual(act.max_force.numpy()[2], 50.0)
-        self.assertTrue(math.isinf(act.max_force.numpy()[0]))
-
-    def test_mixed_types_with_replication(self):
-        """Test mixed actuator types, replication, DOF offsets, and input/output indices."""
-        template = newton.ModelBuilder()
+        integral = 0.0
+        for step_i in range(3):
+            forces = wp.zeros(1, dtype=wp.float32, device=device)
+            integral += pos_error * dt
+            expected = const + kp * pos_error + ki * integral + kd * vel_error
+
+            ctrl.compute(
+                positions=_f(q),
+                velocities=_f(qd),
+                target_pos=_f(tgt_pos),
+                target_vel=_f(tgt_vel),
+                feedforward=None,
+                pos_indices=indices,
+                vel_indices=indices,
+                target_pos_indices=indices,
+                target_vel_indices=indices,
+                forces=forces,
+                state=state_0,
+                dt=dt,
+                device=device,
+            )
+            ctrl.update_state(state_0, state_1)
+            state_0, state_1 = state_1, state_0
+
+            self.assertAlmostEqual(forces.numpy()[0], expected, places=4, msg=f"step {step_i}")
+
+
+@unittest.skipUnless(_HAS_TORCH, "torch not installed")
+class TestControllerNetMLP(unittest.TestCase):
+    """ControllerNetMLP — construct controller, call compute() directly."""
+
+    def setUp(self):
+        import torch
+
+        self.torch = torch
+        self.device = wp.get_device()
+        self._torch_dev = torch.device(f"cuda:{self.device.ordinal}" if self.device.is_cuda else "cpu")
+
+    def _mlp(self, in_dim, hidden=32):
+        return self.torch.nn.Sequential(
+            self.torch.nn.Linear(in_dim, hidden),
+            self.torch.nn.ELU(),
+            self.torch.nn.Linear(hidden, 1),
+        ).to(self._torch_dev)
+
+    def test_compute(self):
+        """Constant-bias network produces known output; history rolls after update_state."""
+        net = self.torch.nn.Sequential(self.torch.nn.Linear(2, 1, bias=True)).to(self._torch_dev)
+        with self.torch.no_grad():
+            net[0].weight.fill_(0.0)
+            net[0].bias.fill_(42.0)
+
+        n = 1
+        ctrl = ControllerNetMLP(network=net)
+        ctrl.finalize(self.device, n)
+        state_a = ctrl.state(n, self.device)
+        state_b = ctrl.state(n, self.device)
+
+        indices = wp.array([0], dtype=wp.uint32, device=self.device)
+        positions = wp.zeros(n, dtype=wp.float32, device=self.device)
+        velocities = wp.zeros(n, dtype=wp.float32, device=self.device)
+        target_pos = wp.array([1.0], dtype=wp.float32, device=self.device)
+        target_vel = wp.zeros(n, dtype=wp.float32, device=self.device)
+        forces = wp.zeros(n, dtype=wp.float32, device=self.device)
+
+        ctrl.compute(
+            positions,
+            velocities,
+            target_pos,
+            target_vel,
+            None,
+            indices,
+            indices,
+            indices,
+            indices,
+            forces,
+            state_a,
+            0.01,
+            self.device,
+        )
+        self.assertAlmostEqual(forces.numpy()[0], 42.0, places=3)
+
+        ctrl.update_state(state_a, state_b)
+        self.assertAlmostEqual(
+            state_b.pos_error_history[0, 0].item(),
+            1.0,
+            places=4,
+            msg="history should contain pos error from current step",
+        )
 
-        body0 = template.add_body()
-        body1 = template.add_body()
-        body2 = template.add_body()
 
-        joint0 = template.add_joint_revolute(parent=-1, child=body0, axis=newton.Axis.Z)
-        joint1 = template.add_joint_revolute(parent=body0, child=body1, axis=newton.Axis.Y)
-        joint2 = template.add_joint_revolute(parent=body1, child=body2, axis=newton.Axis.X)
-        template.add_articulation([joint0, joint1, joint2])
+@unittest.skipUnless(_HAS_TORCH, "torch not installed")
+class TestControllerNetLSTM(unittest.TestCase):
+    """ControllerNetLSTM — construct controller, call compute() directly."""
+
+    def setUp(self):
+        import torch
+
+        self.torch = torch
+        self.device = wp.get_device()
+        self._torch_dev = torch.device(f"cuda:{self.device.ordinal}" if self.device.is_cuda else "cpu")
+
+    def _lstm(self, hidden=8, layers=1):
+        import torch
+
+        class Net(torch.nn.Module):
+            def __init__(self):
+                super().__init__()
+                self.lstm = torch.nn.LSTM(2, hidden, layers, batch_first=True)
+                self.dec = torch.nn.Linear(hidden, 1)
+
+            def forward(self, x, hc):
+                out, (h, c) = self.lstm(x, hc)
+                return self.dec(out[:, -1, :]), (h, c)
+
+        return Net().to(self._torch_dev)
+
+    def test_compute(self):
+        """compute() produces non-zero force; hidden state evolves after update_state."""
+        n = 1
+        hidden, layers = 8, 1
+        ctrl = ControllerNetLSTM(network=self._lstm(hidden=hidden, layers=layers))
+        ctrl.finalize(self.device, n)
+
+        state_a = ctrl.state(n, self.device)
+        state_b = ctrl.state(n, self.device)
+        self.assertTrue(self.torch.all(state_a.hidden == 0.0).item())
+
+        indices = wp.array([0], dtype=wp.uint32, device=self.device)
+        positions = wp.zeros(n, dtype=wp.float32, device=self.device)
+        velocities = wp.array([1.0], dtype=wp.float32, device=self.device)
+        target_pos = wp.array([1.0], dtype=wp.float32, device=self.device)
+        target_vel = wp.zeros(n, dtype=wp.float32, device=self.device)
+        forces = wp.zeros(n, dtype=wp.float32, device=self.device)
+
+        ctrl.compute(
+            positions,
+            velocities,
+            target_pos,
+            target_vel,
+            None,
+            indices,
+            indices,
+            indices,
+            indices,
+            forces,
+            state_a,
+            0.01,
+            self.device,
+        )
+        ctrl.update_state(state_a, state_b)
 
-        dof0 = template.joint_qd_start[joint0]
-        dof1 = template.joint_qd_start[joint1]
-        dof2 = template.joint_qd_start[joint2]
+        self.assertNotAlmostEqual(forces.numpy()[0], 0.0, places=5, msg="LSTM should produce non-zero force")
+        self.assertFalse(self.torch.all(state_b.hidden == 0.0).item(), "hidden state should evolve")
 
-        template.add_actuator(ActuatorPD, input_indices=[dof0], kp=100.0, kd=10.0)
-        template.add_actuator(ActuatorPID, input_indices=[dof1], kp=200.0, ki=5.0, kd=20.0)
-        template.add_actuator(ActuatorPD, input_indices=[dof1], output_indices=[dof2], kp=300.0)
 
-        num_worlds = 3
-        builder = newton.ModelBuilder()
-        builder.replicate(template, num_worlds)
+# ---------------------------------------------------------------------------
+# 2. Delay
+# ---------------------------------------------------------------------------
 
-        model = builder.finalize()
 
-        self.assertEqual(model.world_count, num_worlds)
-        self.assertEqual(len(model.actuators), 2)
+class TestDelay(unittest.TestCase):
+    """Delay unit tests — construct Delay directly, call get_delayed_targets/update_state."""
 
-        pd_act = next(a for a in model.actuators if type(a) is ActuatorPD)
-        pid_act = next(a for a in model.actuators if isinstance(a, ActuatorPID))
+    def test_buffer_shape(self):
+        """State buffers have correct shape (buf_depth, N)."""
+        n, max_delay = 2, 5
+        device = wp.get_device()
+        delays = wp.array([max_delay] * n, dtype=wp.int32, device=device)
+        delay = Delay(delays, max_delay)
+        delay.finalize(device, n)
 
-        self.assertEqual(pd_act.num_actuators, 2 * num_worlds)
-        self.assertEqual(pid_act.num_actuators, num_worlds)
+        ds = delay.state(n, device)
+        self.assertEqual(ds.buffer_pos.shape, (max_delay, n))
+        self.assertEqual(ds.buffer_vel.shape, (max_delay, n))
+        self.assertEqual(ds.buffer_act.shape, (max_delay, n))
+        self.assertEqual(ds.write_idx, max_delay - 1)
+        np.testing.assert_array_equal(ds.num_pushes.numpy(), [0, 0])
 
-        np.testing.assert_array_almost_equal(pd_act.kp.numpy(), [100.0, 300.0] * num_worlds)
-        np.testing.assert_array_almost_equal(pid_act.ki.numpy(), [5.0] * num_worlds)
+    def test_latency_behavior(self):
+        """Delay=N gives exactly N steps of delay; empty buffer falls back to current targets."""
+        n, delay_val = 1, 2
+        device = wp.get_device()
+        delays = wp.array([delay_val], dtype=wp.int32, device=device)
+        delay = Delay(delays, delay_val)
+        delay.finalize(device, n)
 
-        pd_in = pd_act.input_indices.numpy()
-        pd_out = pd_act.output_indices.numpy()
-        dofs_per_world = model.joint_dof_count // num_worlds
+        indices = wp.array([0], dtype=wp.uint32, device=device)
+        state_0 = delay.state(n, device)
+        state_1 = delay.state(n, device)
 
-        for w in range(1, num_worlds):
-            self.assertEqual(pd_in[w * 2] - pd_in[(w - 1) * 2], dofs_per_world)
-            self.assertEqual(pd_in[w * 2 + 1] - pd_in[(w - 1) * 2 + 1], dofs_per_world)
-            self.assertEqual(pd_out[w * 2 + 1] - pd_out[(w - 1) * 2 + 1], dofs_per_world)
+        read_history = []
+        for step_i in range(delay_val + 3):
+            target_val = float(step_i + 1) * 10.0
+            tgt_pos = wp.array([target_val], dtype=wp.float32, device=device)
+            tgt_vel = wp.zeros(1, dtype=wp.float32, device=device)
 
-        for w in range(num_worlds):
-            self.assertNotEqual(pd_in[w * 2 + 1], pd_out[w * 2 + 1])
+            out_pos, _out_vel, _out_act = delay.get_delayed_targets(tgt_pos, tgt_vel, None, indices, indices, state_0)
+            read_history.append(out_pos.numpy()[0])
+            delay.update_state(tgt_pos, tgt_vel, None, indices, indices, state_0, state_1)
+            state_0, state_1 = state_1, state_0
 
-    def test_delay_grouping(self):
-        """Test: same delay groups, different delays separate, mixed with simple PD."""
-        builder = newton.ModelBuilder()
+        self.assertAlmostEqual(read_history[0], 10.0, places=4, msg="step 0: empty buffer -> current target")
+        self.assertAlmostEqual(read_history[1], 10.0, places=4, msg="step 1: 1 entry, lag clamped -> oldest (10)")
+        self.assertAlmostEqual(read_history[2], 10.0, places=4, msg="step 2: full delay=2 -> reads step 0 (10)")
+        self.assertAlmostEqual(read_history[3], 20.0, places=4, msg="step 3: full delay=2 -> reads step 1 (20)")
+        self.assertAlmostEqual(read_history[4], 30.0, places=4, msg="step 4: full delay=2 -> reads step 2 (30)")
 
-        bodies = [builder.add_body() for _ in range(6)]
-        joints = []
-        for i, body in enumerate(bodies):
-            parent = -1 if i == 0 else bodies[i - 1]
-            joints.append(builder.add_joint_revolute(parent=parent, child=body, axis=newton.Axis.Z))
-        builder.add_articulation(joints)
 
-        dofs = [builder.joint_qd_start[j] for j in joints]
+# ---------------------------------------------------------------------------
+# 3. Clamping
+# ---------------------------------------------------------------------------
 
-        builder.add_actuator(ActuatorPD, input_indices=[dofs[0]], kp=100.0)
-        builder.add_actuator(ActuatorPD, input_indices=[dofs[1]], kp=150.0)
-        builder.add_actuator(ActuatorDelayedPD, input_indices=[dofs[2]], kp=200.0, delay=3)
-        builder.add_actuator(ActuatorDelayedPD, input_indices=[dofs[3]], kp=250.0, delay=3)
-        builder.add_actuator(ActuatorDelayedPD, input_indices=[dofs[4]], kp=300.0, delay=7)
-        builder.add_actuator(ActuatorDelayedPD, input_indices=[dofs[5]], kp=350.0, delay=7)
 
-        model = builder.finalize()
+class TestClampingMaxForce(unittest.TestCase):
+    """ClampingMaxForce: output is clamped to +/-max_force."""
 
-        self.assertEqual(len(model.actuators), 3)
+    def test_modify_forces(self):
+        """Construct clamping directly and call modify_forces()."""
+        max_f = 50.0
+        n = 3
+        clamp = ClampingMaxForce(max_force=wp.array([max_f] * n, dtype=wp.float32))
 
-        pd_act = next(a for a in model.actuators if type(a) is ActuatorPD)
-        delay3 = next(a for a in model.actuators if isinstance(a, ActuatorDelayedPD) and a.delay == 3)
-        delay7 = next(a for a in model.actuators if isinstance(a, ActuatorDelayedPD) and a.delay == 7)
+        src_vals = [100.0, -80.0, 30.0]
+        src = wp.array(src_vals, dtype=wp.float32)
+        dst = wp.zeros(n, dtype=wp.float32)
+        indices = wp.array(list(range(n)), dtype=wp.uint32)
 
-        self.assertEqual(pd_act.num_actuators, 2)
-        self.assertEqual(delay3.num_actuators, 2)
-        self.assertEqual(delay7.num_actuators, 2)
+        clamp.modify_forces(src, dst, wp.zeros(n, dtype=wp.float32), wp.zeros(n, dtype=wp.float32), indices, indices)
 
-        np.testing.assert_array_almost_equal(delay3.kp.numpy(), [200.0, 250.0])
+        result = dst.numpy()
+        for i, s in enumerate(src_vals):
+            expected = max(min(s, max_f), -max_f)
+            self.assertAlmostEqual(result[i], expected, places=5, msg=f"DOF {i}")
 
-    def test_multi_input_actuator_2d_indices(self):
-        """Test actuators with multiple input indices (2D index arrays)."""
-        builder = newton.ModelBuilder()
 
-        # Create 6 joints for testing
-        bodies = [builder.add_body() for _ in range(6)]
-        joints = []
-        for i, body in enumerate(bodies):
-            parent = -1 if i == 0 else bodies[i - 1]
-            joints.append(builder.add_joint_revolute(parent=parent, child=body, axis=newton.Axis.Z))
-        builder.add_articulation(joints)
+class TestClampingDCMotor(unittest.TestCase):
+    """DC motor torque-speed curve: clamp = saturation * (1 - v/v_limit)."""
 
-        dofs = [builder.joint_qd_start[j] for j in joints]
+    def test_modify_forces(self):
+        """Construct clamping directly and call modify_forces() at several velocity points."""
+        sat, v_lim, max_f = 100.0, 10.0, 200.0
+        clamp = ClampingDCMotor(
+            saturation_effort=wp.array([sat], dtype=wp.float32),
+            velocity_limit=wp.array([v_lim], dtype=wp.float32),
+            max_force=wp.array([max_f], dtype=wp.float32),
+        )
+        indices = wp.array([0], dtype=wp.uint32)
+        raw_force = 500.0
 
-        # Add multi-input actuators: each reads from 2 DOFs
-        builder.add_actuator(ActuatorPD, input_indices=[dofs[0], dofs[1]], kp=100.0)
-        builder.add_actuator(ActuatorPD, input_indices=[dofs[2], dofs[3]], kp=200.0)
-        builder.add_actuator(ActuatorPD, input_indices=[dofs[4], dofs[5]], kp=300.0)
+        for qd in [0.0, 5.0, 10.0, -5.0]:
+            src = wp.array([raw_force], dtype=wp.float32)
+            dst = wp.zeros(1, dtype=wp.float32)
+            vel = wp.array([qd], dtype=wp.float32)
 
-        model = builder.finalize()
+            clamp.modify_forces(src, dst, wp.zeros(1, dtype=wp.float32), vel, indices, indices)
 
-        # Should have 1 ActuatorPD instance with 3 actuators, each with 2 inputs
-        self.assertEqual(len(model.actuators), 1)
-        pd_act = model.actuators[0]
+            tau_max = max(min(sat * (1.0 - qd / v_lim), max_f), 0.0)
+            tau_min = max(min(sat * (-1.0 - qd / v_lim), 0.0), -max_f)
+            expected = max(min(raw_force, tau_max), tau_min)
+            self.assertAlmostEqual(dst.numpy()[0], expected, places=3, msg=f"qd={qd}")
+
+
+class TestClampingPositionBased(unittest.TestCase):
+    """Position-based clamping with angle-dependent lookup table."""
 
-        self.assertEqual(pd_act.num_actuators, 3)
+    def test_modify_forces(self):
+        """Construct clamping directly and verify interpolated angle-dependent limits."""
+        angles = (-1.0, 0.0, 1.0)
+        torques = (10.0, 30.0, 50.0)
+        device = wp.get_device()
+        clamp = ClampingPositionBased(lookup_angles=angles, lookup_torques=torques)
+        clamp.finalize(device, 1)
 
-        # Verify input_indices shape is 2D: (3, 2)
-        input_arr = pd_act.input_indices.numpy()
-        self.assertEqual(input_arr.shape, (3, 2))
+        raw_force = 999.0
+        indices = wp.array([0], dtype=wp.uint32, device=device)
+
+        for pos, expected_limit in [(-1.0, 10.0), (0.0, 30.0), (1.0, 50.0), (-0.5, 20.0), (0.5, 40.0)]:
+            src = wp.array([raw_force], dtype=wp.float32, device=device)
+            dst = wp.zeros(1, dtype=wp.float32, device=device)
+            positions = wp.array([pos], dtype=wp.float32, device=device)
+
+            clamp.modify_forces(
+                src, dst, positions, wp.zeros(1, dtype=wp.float32, device=device), indices, indices, device=device
+            )
 
-        # Verify the indices are correct
-        np.testing.assert_array_equal(input_arr[0], [dofs[0], dofs[1]])
-        np.testing.assert_array_equal(input_arr[1], [dofs[2], dofs[3]])
-        np.testing.assert_array_equal(input_arr[2], [dofs[4], dofs[5]])
+            self.assertAlmostEqual(dst.numpy()[0], expected_limit, places=2, msg=f"pos={pos}")
 
-        # Verify output_indices also has same shape
-        output_arr = pd_act.output_indices.numpy()
-        self.assertEqual(output_arr.shape, (3, 2))
 
-        # Verify kp array has correct values (one per actuator)
-        np.testing.assert_array_almost_equal(pd_act.kp.numpy(), [100.0, 200.0, 300.0])
+# ---------------------------------------------------------------------------
+# 4. Actuator pipeline — full step() integration
+# ---------------------------------------------------------------------------
+
+
+class TestActuatorStep(unittest.TestCase):
+    """Integration test: full Actuator.step() with delay + PD + DC-motor clamping."""
+
+    def test_full_pipeline(self):
+        """Two-joint template x 3 envs, per-DOF delays (2 / 3), PD + DC motor.
+
+        At each of 5 steps we verify:
+            raw   = kp*(delayed_target - q) + kd*(0 - qd)
+            τ_max = clamp(sat*(1 - qd/v_lim),  0,  max_f)
+            τ_min = clamp(sat*(-1 - qd/v_lim), -max_f, 0)
+            force = clamp(raw, τ_min, τ_max)
+        """
+        kp, kd = 50.0, 5.0
+        sat, v_lim = 80.0, 20.0
+        delay_a, delay_b = 2, 3
+        num_envs = 3
+        dt = 0.01
+
+        template = newton.ModelBuilder()
+        link_a = template.add_link()
+        joint_a = template.add_joint_revolute(parent=-1, child=link_a, axis=newton.Axis.Z)
+        link_b = template.add_link()
+        joint_b = template.add_joint_revolute(parent=link_a, child=link_b, axis=newton.Axis.Z)
+        template.add_articulation([joint_a, joint_b])
+        dof_a = template.joint_qd_start[joint_a]
+        dof_b = template.joint_qd_start[joint_b]
+        dc_args = {"saturation_effort": sat, "velocity_limit": v_lim, "max_force": 1e6}
+        template.add_actuator(
+            ControllerPD,
+            index=dof_a,
+            kp=kp,
+            kd=kd,
+            delay=delay_a,
+            clamping=[(ClampingDCMotor, dc_args)],
+        )
+        template.add_actuator(
+            ControllerPD,
+            index=dof_b,
+            kp=kp,
+            kd=kd,
+            delay=delay_b,
+            clamping=[(ClampingDCMotor, dc_args)],
+        )
 
-    def test_dimension_mismatch_raises_error(self):
-        """Test that mixing different input dimensions raises an error."""
         builder = newton.ModelBuilder()
+        builder.replicate(template, num_envs)
+        model = builder.finalize()
 
-        bodies = [builder.add_body() for _ in range(3)]
-        joints = []
-        for i, body in enumerate(bodies):
-            parent = -1 if i == 0 else bodies[i - 1]
-            joints.append(builder.add_joint_revolute(parent=parent, child=body, axis=newton.Axis.Z))
-        builder.add_articulation(joints)
+        self.assertEqual(len(model.actuators), 1, "all DOFs share controller+clamping type")
+        actuator = model.actuators[0]
+        n = actuator.num_actuators
+        self.assertEqual(n, 2 * num_envs)
 
-        dofs = [builder.joint_qd_start[j] for j in joints]
+        delays_np = actuator.delay.delays.numpy()
+        expected_delays = [delay_a, delay_b] * num_envs
+        np.testing.assert_array_equal(delays_np, expected_delays)
 
-        # First call: 1 input
-        builder.add_actuator(ActuatorPD, input_indices=[dofs[0]], kp=100.0)
+        state = model.state()
+        state_0 = actuator.state()
+        state_1 = actuator.state()
 
-        # Second call: 2 inputs - should raise
-        with self.assertRaises(ValueError) as ctx:
-            builder.add_actuator(ActuatorPD, input_indices=[dofs[1], dofs[2]], kp=200.0)
+        qd_val = 2.0
+        dofs = actuator.indices.numpy().tolist()
+        _write_dof_values(model, state.joint_qd, dofs, [qd_val] * n)
 
-        self.assertIn("dimension mismatch", str(ctx.exception))
+        target_schedule = [10.0, 20.0, 30.0, 40.0, 50.0]
+        written_targets: list[float] = []
 
+        def _dc_clamp(raw: float, vel: float) -> float:
+            tau_max = max(min(sat * (1.0 - vel / v_lim), 1e6), 0.0)
+            tau_min = max(min(sat * (-1.0 - vel / v_lim), 0.0), -1e6)
+            return max(min(raw, tau_max), tau_min)
 
-@unittest.skipUnless(HAS_USD, "pxr not installed")
-class TestActuatorUSDParsing(unittest.TestCase):
-    """Tests for parsing actuators from USD files."""
+        def _delayed_target(step_i: int, dof_delay: int) -> float:
+            pushes = step_i
+            if pushes == 0:
+                return target_schedule[step_i]
+            lag = min(dof_delay - 1, pushes - 1)
+            return written_targets[step_i - 1 - lag]
 
-    def test_usd_parsing(self):
-        """Test that USD parsing automatically parses Newton actuators."""
+        for step_i in range(5):
+            control = model.control()
+            tgt = target_schedule[step_i]
+            _write_dof_values(model, control.joint_target_pos, dofs, [tgt] * n)
+            written_targets.append(tgt)
+
+            actuator.step(state, control, state_0, state_1, dt)
+            state_0, state_1 = state_1, state_0
+
+            forces = control.joint_f.numpy()
+            for local_i in range(n):
+                d = dofs[local_i]
+                dof_delay = expected_delays[local_i]
+                delayed_tgt = _delayed_target(step_i, dof_delay)
+                raw = kp * (delayed_tgt - 0.0) + kd * (0.0 - qd_val)
+                expected = _dc_clamp(raw, qd_val)
+                self.assertAlmostEqual(
+                    forces[d],
+                    expected,
+                    places=3,
+                    msg=f"step={step_i} dof={local_i} delay={dof_delay} "
+                    f"delayed_tgt={delayed_tgt} raw={raw} expected={expected}",
+                )
+
+        ds = state_0.delay_state
+        np.testing.assert_array_equal(
+            ds.num_pushes.numpy(),
+            [min(5, actuator.delay.buf_depth)] * n,
+            err_msg="num_pushes should be clamped to buf_depth",
+        )
+
+
+# ---------------------------------------------------------------------------
+# 5. Builder — from USD, programmatic, and free-joint replication
+# ---------------------------------------------------------------------------
+
+
+class TestActuatorBuilder(unittest.TestCase):
+    """ModelBuilder actuator construction — grouping, params, state, and index layouts."""
+
+    @unittest.skipUnless(HAS_USD, "pxr not installed")
+    def test_from_usd(self):
+        """Load actuators from a USD stage and verify params after finalize.
+
+        The asset has two actuators:
+          Joint1Actuator: PD (kp=100, kd=10) + MaxForce(50)
+          Joint2Actuator: PD (kp=200, kd=20) + Delay(5)
+        Different clamping/delay splits them into separate groups.
+        """
         test_dir = os.path.dirname(__file__)
         usd_path = os.path.join(test_dir, "assets", "actuator_test.usda")
-
         if not os.path.exists(usd_path):
             self.skipTest(f"Test USD file not found: {usd_path}")
 
-        # Actuators are parsed automatically
         builder = newton.ModelBuilder()
         result = parse_usd(builder, usd_path)
         self.assertGreater(result["actuator_count"], 0)
         model = builder.finalize()
-        self.assertGreater(len(model.actuators), 0)
 
-        # Verify parsed parameters
-        stage = Usd.Stage.Open(usd_path)
-        actuator_prim = stage.GetPrimAtPath("/World/Robot/Joint1Actuator")
-        parsed = parse_actuator_prim(actuator_prim)
+        self.assertEqual(len(model.actuators), 2)
+        clamped = next(a for a in model.actuators if a.clamping)
+        delayed = next(a for a in model.actuators if a.delay is not None)
+
+        self.assertEqual(clamped.num_actuators, 1)
+        self.assertAlmostEqual(clamped.controller.kp.numpy()[0], 100.0, places=3)
+        self.assertAlmostEqual(clamped.controller.kd.numpy()[0], 10.0, places=3)
+        self.assertIsInstance(clamped.clamping[0], ClampingMaxForce)
+        self.assertAlmostEqual(clamped.clamping[0].max_force.numpy()[0], 50.0, places=3)
 
+        self.assertEqual(delayed.num_actuators, 1)
+        self.assertAlmostEqual(delayed.controller.kp.numpy()[0], 200.0, places=3)
+        self.assertAlmostEqual(delayed.controller.kd.numpy()[0], 20.0, places=3)
+        np.testing.assert_array_equal(delayed.delay.delays.numpy(), [5])
+        self.assertEqual(delayed.delay.buf_depth, 5)
+
+        stage = Usd.Stage.Open(usd_path)
+        parsed = parse_actuator_prim(stage.GetPrimAtPath("/World/Robot/Joint1Actuator"))
         self.assertIsNotNone(parsed)
-        self.assertEqual(parsed.kwargs.get("kp"), 100.0)
-        self.assertEqual(parsed.kwargs.get("kd"), 10.0)
+        self.assertIsInstance(parsed, ActuatorParsed)
+        self.assertEqual(parsed.controller_class, ControllerPD)
 
+    def test_programmatic(self):
+        """Mixed controller types, clamping, and delays via add_actuator.
 
-class TestActuatorSelectionAPI(unittest.TestCase):
-    """Tests for actuator parameter access via ArticulationView.
+        3-joint chain: PD, PID with DC motor clamping, PD with delay=4.
+        Verifies grouping (3 groups), per-DOF params, and state shapes.
+        """
+        builder = newton.ModelBuilder()
+        links = [builder.add_link() for _ in range(3)]
+        joints = []
+        for i, link in enumerate(links):
+            parent = -1 if i == 0 else links[i - 1]
+            joints.append(builder.add_joint_revolute(parent=parent, child=link, axis=newton.Axis.Z))
+        builder.add_articulation(joints)
+        dofs = [builder.joint_qd_start[j] for j in joints]
 
-    Follows the same parameterised pattern as the joint/link selection tests:
-    a single ``run_test_actuator_selection`` helper is driven by four thin
-    entry-point tests that cover (use_mask x use_multiple_artics_per_view).
-    """
+        builder.add_actuator(ControllerPD, index=dofs[0], kp=50.0, kd=5.0, constant_force=1.0)
+        builder.add_actuator(
+            ControllerPID,
+            index=dofs[1],
+            kp=100.0,
+            ki=10.0,
+            kd=20.0,
+            clamping=[(ClampingDCMotor, {"saturation_effort": 80.0, "velocity_limit": 15.0, "max_force": 200.0})],
+        )
+        builder.add_actuator(ControllerPD, index=dofs[2], kp=150.0, delay=4)
 
-    def run_test_actuator_selection(self, use_mask: bool, use_multiple_artics_per_view: bool):
-        """Test an ArticulationView that includes a subset of joints and that we
-        can read/write actuator parameters for the subset with and without a mask.
-        Verifies the full flat actuator parameter array."""
+        model = builder.finalize()
+        self.assertEqual(len(model.actuators), 3)
+
+        pd_plain = next(a for a in model.actuators if isinstance(a.controller, ControllerPD) and a.delay is None)
+        pid_act = next(a for a in model.actuators if isinstance(a.controller, ControllerPID))
+        pd_delay = next(a for a in model.actuators if isinstance(a.controller, ControllerPD) and a.delay is not None)
+
+        self.assertEqual(pd_plain.num_actuators, 1)
+        np.testing.assert_array_almost_equal(pd_plain.controller.kp.numpy(), [50.0])
+        np.testing.assert_array_almost_equal(pd_plain.controller.kd.numpy(), [5.0])
+        np.testing.assert_array_almost_equal(pd_plain.controller.constant_force.numpy(), [1.0])
+        self.assertIsNone(pd_plain.state())
+
+        self.assertEqual(pid_act.num_actuators, 1)
+        np.testing.assert_array_almost_equal(pid_act.controller.kp.numpy(), [100.0])
+        np.testing.assert_array_almost_equal(pid_act.controller.ki.numpy(), [10.0])
+        np.testing.assert_array_almost_equal(pid_act.controller.kd.numpy(), [20.0])
+        self.assertIsInstance(pid_act.clamping[0], ClampingDCMotor)
+        self.assertAlmostEqual(pid_act.clamping[0].saturation_effort.numpy()[0], 80.0, places=3)
+        pid_state = pid_act.state()
+        self.assertIsNotNone(pid_state.controller_state)
+        self.assertEqual(pid_state.controller_state.integral.shape, (1,))
+        np.testing.assert_array_equal(pid_state.controller_state.integral.numpy(), [0.0])
+
+        self.assertEqual(pd_delay.num_actuators, 1)
+        np.testing.assert_array_almost_equal(pd_delay.controller.kp.numpy(), [150.0])
+        np.testing.assert_array_equal(pd_delay.delay.delays.numpy(), [4])
+        self.assertEqual(pd_delay.delay.buf_depth, 4)
+        ds = pd_delay.state().delay_state
+        self.assertEqual(ds.buffer_pos.shape, (4, 1))
+        np.testing.assert_array_equal(ds.num_pushes.numpy(), [0])
+
+    def test_free_joint_with_replication(self):
+        """Free-joint base + 2 revolute children x 3 envs.
+
+        Verifies:
+        - pos_indices != indices when joint_q layout differs from joint_qd
+        - Correct per-DOF parameter replication across environments
+        - State shapes scale with num_envs
+        """
+        num_envs = 3
 
+        template = newton.ModelBuilder()
+        base = template.add_link()
+        j_free = template.add_joint_free(child=base)
+        link1 = template.add_link()
+        j1 = template.add_joint_revolute(parent=base, child=link1, axis=newton.Axis.Z)
+        link2 = template.add_link()
+        j2 = template.add_joint_revolute(parent=link1, child=link2, axis=newton.Axis.Y)
+        template.add_articulation([j_free, j1, j2])
+
+        dof1 = template.joint_qd_start[j1]
+        dof2 = template.joint_qd_start[j2]
+
+        template.add_actuator(
+            ControllerPD, index=dof1, kp=100.0, kd=10.0, pos_index=template.joint_q_start[j1], delay=2
+        )
+        template.add_actuator(
+            ControllerPD, index=dof2, kp=200.0, kd=20.0, pos_index=template.joint_q_start[j2], delay=3
+        )
+
+        builder = newton.ModelBuilder()
+        builder.replicate(template, num_envs)
+        model = builder.finalize()
+
+        self.assertEqual(len(model.actuators), 1)
+        act = model.actuators[0]
+        n = 2 * num_envs
+        self.assertEqual(act.num_actuators, n)
+
+        pos_idx = act.pos_indices.numpy()
+        vel_idx = act.indices.numpy()
+        self.assertFalse(
+            np.array_equal(pos_idx, vel_idx),
+            "pos_indices should differ from indices for free-joint articulations",
+        )
+
+        np.testing.assert_array_almost_equal(act.controller.kp.numpy(), [100.0, 200.0] * num_envs)
+        np.testing.assert_array_almost_equal(act.controller.kd.numpy(), [10.0, 20.0] * num_envs)
+
+        np.testing.assert_array_equal(act.delay.delays.numpy(), [2, 3] * num_envs)
+        self.assertEqual(act.delay.buf_depth, 3)
+
+        act_state = act.state()
+        self.assertEqual(act_state.delay_state.buffer_pos.shape, (3, n))
+        np.testing.assert_array_equal(act_state.delay_state.num_pushes.numpy(), [0] * n)
+
+
+# ---------------------------------------------------------------------------
+# 6. Legacy newton_actuators backward compatibility
+# ---------------------------------------------------------------------------
+
+
+@unittest.skipUnless(_HAS_LEGACY_ACTUATORS, "newton_actuators not installed")
+class TestLegacyActuatorCompat(unittest.TestCase):
+    """Deprecated ``newton_actuators`` calling conventions must still work."""
+
+    def _make_builder(self, n_joints=2):
+        builder = newton.ModelBuilder()
+        links = [builder.add_link() for _ in range(n_joints)]
+        joints = []
+        for i, link in enumerate(links):
+            parent = -1 if i == 0 else links[i - 1]
+            joints.append(builder.add_joint_revolute(parent=parent, child=link, axis=newton.Axis.Z))
+        builder.add_articulation(joints)
+        dofs = [builder.joint_qd_start[j] for j in joints]
+        return builder, dofs
+
+    def test_legacy_positional_list(self):
+        """add_actuator(ActuatorPD, [dof], kp=...) — old positional style."""
+        builder, dofs = self._make_builder()
+        with self.assertWarns(DeprecationWarning):
+            builder.add_actuator(ActuatorPD, [dofs[0]], kp=50.0, kd=5.0)
+        model = builder.finalize()
+        self.assertEqual(len(model.actuators), 1)
+        self.assertIsInstance(model.actuators[0].controller, ControllerPD)
+        np.testing.assert_array_almost_equal(model.actuators[0].controller.kp.numpy(), [50.0])
+
+    def test_legacy_keyword_input_indices(self):
+        """add_actuator(ActuatorPD, input_indices=[dof], kp=...)."""
+        builder, dofs = self._make_builder()
+        with self.assertWarns(DeprecationWarning):
+            builder.add_actuator(ActuatorPD, input_indices=[dofs[0]], kp=100.0)
+        model = builder.finalize()
+        self.assertEqual(len(model.actuators), 1)
+        np.testing.assert_array_almost_equal(model.actuators[0].controller.kp.numpy(), [100.0])
+
+    def test_legacy_keyword_actuator_class(self):
+        """add_actuator(actuator_class=ActuatorPD, input_indices=[dof], kp=...)."""
+        builder, dofs = self._make_builder()
+        with self.assertWarns(DeprecationWarning):
+            builder.add_actuator(actuator_class=ActuatorPD, input_indices=[dofs[0]], kp=75.0)
+        model = builder.finalize()
+        self.assertEqual(len(model.actuators), 1)
+        np.testing.assert_array_almost_equal(model.actuators[0].controller.kp.numpy(), [75.0])
+
+    def test_legacy_delayed_pd(self):
+        """ActuatorDelayedPD maps to ControllerPD + delay."""
+        builder, dofs = self._make_builder()
+        with self.assertWarns(DeprecationWarning):
+            builder.add_actuator(ActuatorDelayedPD, input_indices=[dofs[0]], kp=200.0, delay=3)
+        model = builder.finalize()
+        act = model.actuators[0]
+        self.assertIsInstance(act.controller, ControllerPD)
+        self.assertIsNotNone(act.delay)
+        np.testing.assert_array_equal(act.delay.delays.numpy(), [3])
+
+    def test_legacy_pid(self):
+        """ActuatorPID maps to ControllerPID."""
+        builder, dofs = self._make_builder()
+        with self.assertWarns(DeprecationWarning):
+            builder.add_actuator(ActuatorPID, [dofs[0]], kp=100.0, ki=10.0, kd=20.0)
+        model = builder.finalize()
+        act = model.actuators[0]
+        self.assertIsInstance(act.controller, ControllerPID)
+        np.testing.assert_array_almost_equal(act.controller.ki.numpy(), [10.0])
+
+    def test_legacy_max_force_becomes_clamping(self):
+        """max_force kwarg creates a ClampingMaxForce on the new actuator."""
+        builder, dofs = self._make_builder()
+        with self.assertWarns(DeprecationWarning):
+            builder.add_actuator(ActuatorPD, input_indices=[dofs[0]], kp=150.0, max_force=50.0)
+        model = builder.finalize()
+        act = model.actuators[0]
+        self.assertEqual(len(act.clamping), 1)
+        self.assertIsInstance(act.clamping[0], ClampingMaxForce)
+        np.testing.assert_array_almost_equal(act.clamping[0].max_force.numpy(), [50.0])
+
+    def test_legacy_output_indices_warns(self):
+        """output_indices != input_indices emits extra deprecation warning."""
+        builder, dofs = self._make_builder()
+        with self.assertWarns(DeprecationWarning):
+            builder.add_actuator(ActuatorPD, [dofs[0]], [dofs[1]], kp=50.0)
+
+    def test_legacy_gear_warns(self):
+        """Non-unity gear emits a deprecation warning."""
+        builder, dofs = self._make_builder()
+        with self.assertWarns(DeprecationWarning):
+            builder.add_actuator(ActuatorPD, input_indices=[dofs[0]], kp=50.0, gear=2.0)
+
+    def test_new_api_no_warning(self):
+        """New-style calls must not emit DeprecationWarning."""
+        builder, dofs = self._make_builder()
+
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            builder.add_actuator(ControllerPD, index=dofs[0], kp=50.0)
+        dep = [x for x in w if issubclass(x.category, DeprecationWarning)]
+        self.assertEqual(len(dep), 0, f"Unexpected warnings: {[str(x.message) for x in dep]}")
+
+
+# ---------------------------------------------------------------------------
+# 7. Parameter access via ArticulationView
+# ---------------------------------------------------------------------------
+
+
+class TestActuatorSelectionAPI(unittest.TestCase):
+    """Tests for actuator parameter access via ArticulationView."""
+
+    def run_test_actuator_selection(self, use_mask: bool, use_multiple_artics_per_view: bool):
         mjcf = """<?xml version="1.0" ?>
 <mujoco model="myart">
     <worldbody>
-    <!-- Root body (fixed to world) -->
     <body name="root" pos="0 0 0">
-      <!-- First child link with prismatic joint along x -->
       <body name="link1" pos="0.0 -0.5 0">
         <joint name="joint1" type="slide" axis="1 0 0" range="-50.5 50.5"/>
         <inertial pos="0 0 0" mass="1.0" diaginertia="0.01 0.01 0.01"/>
       </body>
-      <!-- Second child link with prismatic joint along x -->
       <body name="link2" pos="-0.0 -0.7 0">
         <joint name="joint2" type="slide" axis="1 0 0" range="-50.5 50.5"/>
         <inertial pos="0 0 0" mass="1.0" diaginertia="0.01 0.01 0.01"/>
       </body>
-      <!-- Third child link with prismatic joint along x -->
       <body name="link3" pos="-0.0 -0.9 0">
         <joint name="joint3" type="slide" axis="1 0 0" range="-50.5 50.5"/>
         <inertial pos="0 0 0" mass="1.0" diaginertia="0.01 0.01 0.01"/>
@@ -282,11 +842,9 @@ def run_test_actuator_selection(self, use_mask: bool, use_multiple_artics_per_vi
         num_worlds = 3
         num_actuators = num_joints_per_articulation * num_articulations_per_world * num_worlds
 
-        # Create a single articulation with 3 joints.
         single_articulation_builder = newton.ModelBuilder()
         single_articulation_builder.add_mjcf(mjcf)
 
-        # Add an ActuatorPD for each slide joint: kp = 100, 200, 300
         joint_names = [
             "myart/worldbody/root/link1/joint1",
             "myart/worldbody/root/link2/joint2",
@@ -295,38 +853,31 @@ def run_test_actuator_selection(self, use_mask: bool, use_multiple_artics_per_vi
         for i, jname in enumerate(joint_names):
             j_idx = single_articulation_builder.joint_label.index(jname)
             dof = single_articulation_builder.joint_qd_start[j_idx]
-            single_articulation_builder.add_actuator(ActuatorPD, input_indices=[dof], kp=100.0 * (i + 1))
+            single_articulation_builder.add_actuator(ControllerPD, index=dof, kp=100.0 * (i + 1))
 
-        # Create a world with 2 articulations
         single_world_builder = newton.ModelBuilder()
         for _i in range(num_articulations_per_world):
             single_world_builder.add_builder(single_articulation_builder)
 
-        # Customise the articulation labels in single_world_builder
         single_world_builder.articulation_label[1] = "art1"
         if use_multiple_artics_per_view:
             single_world_builder.articulation_label[0] = "art1"
         else:
             single_world_builder.articulation_label[0] = "art0"
 
-        # Create 3 worlds with two articulations per world and 3 actuators per articulation.
         builder = newton.ModelBuilder()
         for _i in range(num_worlds):
             builder.add_world(single_world_builder)
 
-        # Create the model
         model = builder.finalize()
 
-        # Create a view of "art1/joint3"
         joints_to_include = ["joint3"]
         joint_view = ArticulationView(model, "art1", include_joints=joints_to_include)
 
         actuator = model.actuators[0]
 
-        # Get the kp values for the view's DOFs (only joint3 of selected artics)
-        kp_values = joint_view.get_actuator_parameter(actuator, "kp").numpy().copy()
+        kp_values = joint_view.get_actuator_parameter(actuator, actuator.controller, "kp").numpy().copy()
 
-        # Verify shape and initial values
         if use_multiple_artics_per_view:
             self.assertEqual(kp_values.shape, (num_worlds, 2))
             np.testing.assert_array_almost_equal(kp_values, [[300.0, 300.0]] * num_worlds)
@@ -334,7 +885,6 @@ def run_test_actuator_selection(self, use_mask: bool, use_multiple_artics_per_vi
             self.assertEqual(kp_values.shape, (num_worlds, 1))
             np.testing.assert_array_almost_equal(kp_values, [[300.0]] * num_worlds)
 
-        # Modify the kp values with distinguishable per-slot values
         val = 1000.0
         for world_idx in range(kp_values.shape[0]):
             for dof_idx in range(kp_values.shape[1]):
@@ -345,108 +895,104 @@ def run_test_actuator_selection(self, use_mask: bool, use_multiple_artics_per_vi
         if use_mask:
             mask = wp.array([False, True, False], dtype=bool, device=model.device)
 
-        # Set the modified values
         wp_kp = wp.array(kp_values, dtype=float, device=model.device)
-        joint_view.set_actuator_parameter(actuator, "kp", wp_kp, mask=mask)
+        joint_view.set_actuator_parameter(actuator, actuator.controller, "kp", wp_kp, mask=mask)
 
-        # Build expected flat kp array and verify against the full actuator.kp array.
-        # Initial flat layout: [100, 200, 300] per articulation x 6 articulations.
         expected_kp = []
         if use_mask:
             if use_multiple_artics_per_view:
                 expected_kp = [
                     100.0,
                     200.0,
-                    300.0,  # world0/artic0 — not masked
+                    300.0,
                     100.0,
                     200.0,
-                    300.0,  # world0/artic1 — not masked
+                    300.0,
                     100.0,
                     200.0,
-                    1200.0,  # world1/artic0 — masked, joint3 updated
+                    1200.0,
                     100.0,
                     200.0,
-                    1300.0,  # world1/artic1 — masked, joint3 updated
+                    1300.0,
                     100.0,
                     200.0,
-                    300.0,  # world2/artic0 — not masked
+                    300.0,
                     100.0,
                     200.0,
-                    300.0,  # world2/artic1 — not masked
+                    300.0,
                 ]
             else:
                 expected_kp = [
                     100.0,
                     200.0,
-                    300.0,  # world0/artic0 (art0) — not in view
+                    300.0,
                     100.0,
                     200.0,
-                    300.0,  # world0/artic1 (art1) — not masked
+                    300.0,
                     100.0,
                     200.0,
-                    300.0,  # world1/artic0 (art0) — not in view
+                    300.0,
                     100.0,
                     200.0,
-                    1100.0,  # world1/artic1 (art1) — masked, joint3 updated
+                    1100.0,
                     100.0,
                     200.0,
-                    300.0,  # world2/artic0 (art0) — not in view
+                    300.0,
                     100.0,
                     200.0,
-                    300.0,  # world2/artic1 (art1) — not masked
+                    300.0,
                 ]
         else:
             if use_multiple_artics_per_view:
                 expected_kp = [
                     100.0,
                     200.0,
-                    1000.0,  # world0/artic0 — joint3 updated
+                    1000.0,
                     100.0,
                     200.0,
-                    1100.0,  # world0/artic1 — joint3 updated
+                    1100.0,
                     100.0,
                     200.0,
-                    1200.0,  # world1/artic0 — joint3 updated
+                    1200.0,
                     100.0,
                     200.0,
-                    1300.0,  # world1/artic1 — joint3 updated
+                    1300.0,
                     100.0,
                     200.0,
-                    1400.0,  # world2/artic0 — joint3 updated
+                    1400.0,
                     100.0,
                     200.0,
-                    1500.0,  # world2/artic1 — joint3 updated
+                    1500.0,
                 ]
             else:
                 expected_kp = [
                     100.0,
                     200.0,
-                    300.0,  # world0/artic0 (art0) — not in view
+                    300.0,
                     100.0,
                     200.0,
-                    1000.0,  # world0/artic1 (art1) — joint3 updated
+                    1000.0,
                     100.0,
                     200.0,
-                    300.0,  # world1/artic0 (art0) — not in view
+                    300.0,
                     100.0,
                     200.0,
-                    1100.0,  # world1/artic1 (art1) — joint3 updated
+                    1100.0,
                     100.0,
                     200.0,
-                    300.0,  # world2/artic0 (art0) — not in view
+                    300.0,
                     100.0,
                     200.0,
-                    1200.0,  # world2/artic1 (art1) — joint3 updated
+                    1200.0,
                 ]
 
-        # Verify the full flat actuator kp array
-        measured_kp = actuator.kp.numpy()
+        measured_kp = actuator.controller.kp.numpy()
         for i in range(num_actuators):
             self.assertAlmostEqual(
                 expected_kp[i],
                 measured_kp[i],
                 places=4,
-                msg=f"Expected kp value {i}: {expected_kp[i]}, Measured value: {measured_kp[i]}",
+                msg=f"Expected kp[{i}]={expected_kp[i]}, got {measured_kp[i]}",
             )
 
     def test_actuator_selection_one_per_view_no_mask(self):
@@ -462,183 +1008,234 @@ def test_actuator_selection_two_per_view_with_mask(self):
         self.run_test_actuator_selection(use_mask=True, use_multiple_artics_per_view=True)
 
 
-class TestActuatorStepIntegration(unittest.TestCase):
-    """Tests for actuator.step() with real Model/State/Control objects.
-
-    Verifies the end-to-end flow: set targets on Control -> call actuator.step()
-    -> forces written to control.joint_f.
-
-    Uses add_link() (not add_body()) to avoid implicit free joints that would
-    cause DOF/coord index mismatch when the actuator indexes into joint_q.
-    """
+# ---------------------------------------------------------------------------
+# 7. State reset (masked and full)
+# ---------------------------------------------------------------------------
+
+
+class TestStateReset(unittest.TestCase):
+    """Exercise State.reset() for delay, PID, and composed Actuator.State."""
+
+    def test_delay_masked_reset(self):
+        """Push data into 4-DOF delay buffer, reset DOFs 1 and 3, verify others untouched."""
+        n, max_delay = 4, 2
+        device = wp.get_device()
+        delays = wp.array([max_delay] * n, dtype=wp.int32, device=device)
+        delay = Delay(delays, max_delay)
+        delay.finalize(device, n)
+
+        state_0 = delay.state(n, device)
+        state_1 = delay.state(n, device)
+        indices = wp.array(list(range(n)), dtype=wp.uint32, device=device)
+
+        for step in range(3):
+            tgt = wp.array([float(step + 1) * 10] * n, dtype=wp.float32, device=device)
+            vel = wp.zeros(n, dtype=wp.float32, device=device)
+            delay.update_state(tgt, vel, None, indices, indices, state_0, state_1)
+            state_0, state_1 = state_1, state_0
+
+        pushes_before = state_0.num_pushes.numpy().copy()
+        self.assertTrue(all(p > 0 for p in pushes_before), "all DOFs should have data")
+
+        mask = wp.array([False, True, False, True], dtype=wp.bool, device=device)
+        state_0.reset(mask)
+
+        pushes_after = state_0.num_pushes.numpy()
+        self.assertEqual(pushes_after[0], pushes_before[0], "DOF 0 should be untouched")
+        self.assertEqual(pushes_after[1], 0, "DOF 1 should be reset")
+        self.assertEqual(pushes_after[2], pushes_before[2], "DOF 2 should be untouched")
+        self.assertEqual(pushes_after[3], 0, "DOF 3 should be reset")
+
+        buf_pos = state_0.buffer_pos.numpy()
+        for row in range(max_delay):
+            self.assertEqual(buf_pos[row, 1], 0.0, f"buffer_pos[{row}, 1] should be zeroed")
+            self.assertEqual(buf_pos[row, 3], 0.0, f"buffer_pos[{row}, 3] should be zeroed")
+            self.assertNotEqual(buf_pos[row, 0], 0.0, f"buffer_pos[{row}, 0] should be preserved")
+
+    def test_delay_full_reset(self):
+        """Full reset (mask=None) zeros everything and resets write_idx."""
+        n, max_delay = 2, 3
+        device = wp.get_device()
+        delays = wp.array([max_delay] * n, dtype=wp.int32, device=device)
+        delay = Delay(delays, max_delay)
+        delay.finalize(device, n)
+
+        state = delay.state(n, device)
+        indices = wp.array(list(range(n)), dtype=wp.uint32, device=device)
+        state_tmp = delay.state(n, device)
+
+        for step in range(4):
+            tgt = wp.array([float(step + 1)] * n, dtype=wp.float32, device=device)
+            vel = wp.zeros(n, dtype=wp.float32, device=device)
+            delay.update_state(tgt, vel, None, indices, indices, state, state_tmp)
+            state, state_tmp = state_tmp, state
+
+        self.assertTrue(any(p > 0 for p in state.num_pushes.numpy()))
+
+        state.reset()
+
+        np.testing.assert_array_equal(state.num_pushes.numpy(), [0] * n)
+        np.testing.assert_array_equal(state.buffer_pos.numpy(), np.zeros((max_delay, n)))
+        np.testing.assert_array_equal(state.buffer_vel.numpy(), np.zeros((max_delay, n)))
+        np.testing.assert_array_equal(state.buffer_act.numpy(), np.zeros((max_delay, n)))
+        self.assertEqual(state.write_idx, max_delay - 1)
+
+    def test_pid_masked_reset(self):
+        """PID integral accumulator: masked reset zeros selected DOFs only."""
+        n = 3
+        device = wp.get_device()
+
+        def _f(vals):
+            return wp.array(vals, dtype=wp.float32, device=device)
+
+        indices = wp.array(list(range(n)), dtype=wp.uint32, device=device)
+        ctrl = ControllerPID(
+            kp=_f([50.0] * n),
+            ki=_f([10.0] * n),
+            kd=_f([5.0] * n),
+            integral_max=_f([math.inf] * n),
+            constant_force=_f([0.0] * n),
+        )
+        ctrl.finalize(device, n)
+
+        state_0 = ctrl.state(n, device)
+        state_1 = ctrl.state(n, device)
+
+        for _ in range(3):
+            forces = wp.zeros(n, dtype=wp.float32, device=device)
+            ctrl.compute(
+                positions=_f([0.0] * n),
+                velocities=_f([0.0] * n),
+                target_pos=_f([1.0] * n),
+                target_vel=_f([0.0] * n),
+                feedforward=None,
+                pos_indices=indices,
+                vel_indices=indices,
+                target_pos_indices=indices,
+                target_vel_indices=indices,
+                forces=forces,
+                state=state_0,
+                dt=0.01,
+                device=device,
+            )
+            ctrl.update_state(state_0, state_1)
+            state_0, state_1 = state_1, state_0
 
-    def _build_chain_model(self, num_joints, actuator_cls, actuator_kwargs):
-        """Helper: build a revolute chain with add_link, add one actuator per joint, finalize."""
-        builder = newton.ModelBuilder()
-        links = [builder.add_link() for _ in range(num_joints)]
-        joints = []
-        for i, link in enumerate(links):
-            parent = -1 if i == 0 else links[i - 1]
-            joints.append(builder.add_joint_revolute(parent=parent, child=link, axis=newton.Axis.Z))
-        builder.add_articulation(joints)
-        dofs = [builder.joint_qd_start[j] for j in joints]
-        for dof in dofs:
-            builder.add_actuator(actuator_cls, input_indices=[dof], **actuator_kwargs)
-        return builder.finalize(), dofs
-
-    def _set_control_array(self, model, control_array, dof_indices, values):
-        """Write values into a control array at the given DOF indices."""
-        arr_np = control_array.numpy()
-        for dof, val in zip(dof_indices, values, strict=True):
-            arr_np[dof] = val
-        wp.copy(control_array, wp.array(arr_np, dtype=float, device=model.device))
-
-    def test_pd_step_position_error(self):
-        """ActuatorPD: force = kp * (target_pos - q) when kd=0, gear=1."""
-        model, dofs = self._build_chain_model(3, ActuatorPD, {"kp": 100.0})
-        state = model.state()
-        control = model.control()
-        control.joint_f.zero_()
+        integral_before = state_0.integral.numpy().copy()
+        self.assertTrue(all(v > 0 for v in integral_before), "integrals should have accumulated")
 
-        self._set_control_array(model, control.joint_target_pos, dofs, [1.0, 2.0, 3.0])
+        mask = wp.array([True, False, True], dtype=wp.bool, device=device)
+        state_0.reset(mask)
 
-        actuator = model.actuators[0]
-        actuator.step(state, control)
+        integral_after = state_0.integral.numpy()
+        self.assertAlmostEqual(integral_after[0], 0.0, places=6, msg="DOF 0 should be reset")
+        self.assertAlmostEqual(integral_after[1], integral_before[1], places=6, msg="DOF 1 should be untouched")
+        self.assertAlmostEqual(integral_after[2], 0.0, places=6, msg="DOF 2 should be reset")
 
-        forces = control.joint_f.numpy()
-        np.testing.assert_allclose(
-            [forces[d] for d in dofs],
-            [100.0, 200.0, 300.0],
-            rtol=1e-5,
-        )
+    def test_actuator_composed_reset(self):
+        """Actuator.State.reset delegates to both delay and controller sub-states."""
+        num_envs = 2
+        device = wp.get_device()
 
-    def test_pd_step_velocity_error(self):
-        """ActuatorPD: force includes kd * (target_vel - qd)."""
-        model, dofs = self._build_chain_model(2, ActuatorPD, {"kp": 0.0, "kd": 10.0})
-        state = model.state()
-        control = model.control()
-        control.joint_f.zero_()
+        template = newton.ModelBuilder()
+        link = template.add_link()
+        joint = template.add_joint_revolute(parent=-1, child=link, axis=newton.Axis.Z)
+        template.add_articulation([joint])
+        dof = template.joint_qd_start[joint]
+        template.add_actuator(ControllerPID, index=dof, kp=50.0, ki=10.0, kd=5.0, delay=2)
 
-        self._set_control_array(model, control.joint_target_vel, dofs, [5.0, -3.0])
+        builder = newton.ModelBuilder()
+        builder.replicate(template, num_envs)
+        model = builder.finalize()
 
         actuator = model.actuators[0]
-        actuator.step(state, control)
+        n = actuator.num_actuators
+        self.assertEqual(n, num_envs)
 
-        forces = control.joint_f.numpy()
-        np.testing.assert_allclose(
-            [forces[d] for d in dofs],
-            [50.0, -30.0],
-            rtol=1e-5,
-        )
-
-    def test_pd_step_feedforward_joint_act(self):
-        """ActuatorPD: feedforward joint_act term is added to the output force."""
-        model, dofs = self._build_chain_model(2, ActuatorPD, {"kp": 0.0})
         state = model.state()
-        control = model.control()
-        control.joint_f.zero_()
+        state_0 = actuator.state()
+        state_1 = actuator.state()
+        dofs = actuator.indices.numpy().tolist()
 
-        self._set_control_array(model, control.joint_act, dofs, [7.0, -3.0])
+        for _step in range(3):
+            control = model.control()
+            _write_dof_values(model, control.joint_target_pos, dofs, [10.0] * n)
+            actuator.step(state, control, state_0, state_1, 0.01)
+            state_0, state_1 = state_1, state_0
 
-        actuator = model.actuators[0]
-        actuator.step(state, control)
+        self.assertTrue(all(p > 0 for p in state_0.delay_state.num_pushes.numpy()))
+        self.assertTrue(all(v > 0 for v in state_0.controller_state.integral.numpy()))
+
+        mask = wp.array([True, False], dtype=wp.bool, device=device)
+        state_0.reset(mask)
 
-        forces = control.joint_f.numpy()
-        np.testing.assert_allclose(
-            [forces[d] for d in dofs],
-            [7.0, -3.0],
-            rtol=1e-5,
+        self.assertEqual(state_0.delay_state.num_pushes.numpy()[0], 0, "env 0 delay should be reset")
+        self.assertGreater(state_0.delay_state.num_pushes.numpy()[1], 0, "env 1 delay should be untouched")
+        self.assertAlmostEqual(
+            state_0.controller_state.integral.numpy()[0], 0.0, places=6, msg="env 0 integral should be reset"
         )
+        self.assertGreater(state_0.controller_state.integral.numpy()[1], 0.0, msg="env 1 integral should be untouched")
 
-    def test_pd_step_gear_and_clamp(self):
-        """ActuatorPD: gear ratio scales error and force is clamped to max_force."""
-        model, dofs = self._build_chain_model(1, ActuatorPD, {"kp": 100.0, "gear": 2.0, "max_force": 50.0})
-        state = model.state()
-        control = model.control()
-        control.joint_f.zero_()
 
-        self._set_control_array(model, control.joint_target_pos, dofs, [1.0])
+# ---------------------------------------------------------------------------
+# 8. CUDA graph capture
+# ---------------------------------------------------------------------------
 
-        actuator = model.actuators[0]
-        actuator.step(state, control)
 
-        # gear=2, q=0: force = 2 * (100 * (1 - 2*0)) = 200, clamped to 50
-        forces = control.joint_f.numpy()
-        self.assertAlmostEqual(forces[dofs[0]], 50.0, places=5)
+class TestCUDAGraphCapture(unittest.TestCase):
+    """Smoke test: PD + clamping actuator step is capturable in a CUDA graph."""
 
-    def test_pd_step_constant_force(self):
-        """ActuatorPD: constant_force offset is included in output."""
-        model, dofs = self._build_chain_model(1, ActuatorPD, {"kp": 0.0, "constant_force": 42.0})
-        state = model.state()
-        control = model.control()
-        control.joint_f.zero_()
+    @unittest.skipUnless(wp.is_cuda_available(), "CUDA required")
+    def test_graphable_step(self):
+        """Capture PD + MaxForce + Delay actuator.step() and replay, verify same result."""
+        template = newton.ModelBuilder()
+        link = template.add_link()
+        joint = template.add_joint_revolute(parent=-1, child=link, axis=newton.Axis.Z)
+        template.add_articulation([joint])
+        dof = template.joint_qd_start[joint]
+        template.add_actuator(
+            ControllerPD,
+            index=dof,
+            kp=100.0,
+            kd=10.0,
+            delay=2,
+            clamping=[(ClampingMaxForce, {"max_force": 50.0})],
+        )
 
-        actuator = model.actuators[0]
-        actuator.step(state, control)
+        builder = newton.ModelBuilder()
+        builder.replicate(template, 2)
+        model = builder.finalize(device="cuda:0")
 
-        forces = control.joint_f.numpy()
-        self.assertAlmostEqual(forces[dofs[0]], 42.0, places=5)
+        actuator = model.actuators[0]
+        self.assertTrue(actuator.is_graphable())
 
-    def test_pid_step_integral_accumulation(self):
-        """ActuatorPID: integral term accumulates over multiple steps."""
-        model, dofs = self._build_chain_model(1, ActuatorPID, {"kp": 0.0, "ki": 10.0, "kd": 0.0})
         state = model.state()
         control = model.control()
-
-        self._set_control_array(model, control.joint_target_pos, dofs, [1.0])
-
-        actuator = model.actuators[0]
-        state_a = actuator.state()
-        state_b = actuator.state()
+        state_0 = actuator.state()
+        state_1 = actuator.state()
         dt = 0.01
+        dofs = actuator.indices.numpy().tolist()
+        _write_dof_values(model, control.joint_target_pos, dofs, [1.0] * actuator.num_actuators)
 
-        forces_over_time = []
-        for step_i in range(3):
-            control.joint_f.zero_()
-            if step_i % 2 == 0:
-                current, nxt = state_a, state_b
-            else:
-                current, nxt = state_b, state_a
-            actuator.step(state, control, current, nxt, dt)
-            forces_over_time.append(control.joint_f.numpy()[dofs[0]])
-
-        # integral after step 0: 1.0 * 0.01 = 0.01, force = ki * integral = 0.1
-        # integral after step 1: 0.01 + 0.01 = 0.02, force = 0.2
-        # integral after step 2: 0.02 + 0.01 = 0.03, force = 0.3
-        np.testing.assert_allclose(forces_over_time, [0.1, 0.2, 0.3], rtol=1e-4)
-
-    def test_delayed_pd_step_delay_behavior(self):
-        """ActuatorDelayedPD: targets are delayed by N steps with real Model objects."""
-        delay = 2
-        model, dofs = self._build_chain_model(1, ActuatorDelayedPD, {"kp": 1.0, "delay": delay})
-        state = model.state()
+        actuator.step(state, control, state_0, state_1, dt)
+        state_0, state_1 = state_1, state_0
+        ref_forces = control.joint_f.numpy().copy()
 
-        actuator = model.actuators[0]
-        state_a = actuator.state()
-        state_b = actuator.state()
-        dt = 0.01
+        control_graph = model.control()
+        _write_dof_values(model, control_graph.joint_target_pos, dofs, [1.0] * actuator.num_actuators)
+        state_g0 = actuator.state()
+        state_g1 = actuator.state()
 
-        force_history = []
-        for step_i in range(delay + 2):
-            control = model.control()
-            control.joint_f.zero_()
-            target_val = float(step_i + 1) * 10.0
-            self._set_control_array(model, control.joint_target_pos, dofs, [target_val])
+        with wp.ScopedCapture(device="cuda:0") as capture:
+            actuator.step(state, control_graph, state_g0, state_g1, dt)
 
-            if step_i % 2 == 0:
-                current, nxt = state_a, state_b
-            else:
-                current, nxt = state_b, state_a
-            actuator.step(state, control, current, nxt, dt)
-            force_history.append(control.joint_f.numpy()[dofs[0]])
+        wp.capture_launch(capture.graph)
+        wp.synchronize_device("cuda:0")
 
-        # Steps 0..(delay-1) produce zero (buffer filling)
-        for i in range(delay):
-            self.assertEqual(force_history[i], 0.0, f"Step {i}: expected 0 during fill phase")
-
-        # Step 2 uses target from step 0 (10.0), step 3 uses target from step 1 (20.0)
-        self.assertAlmostEqual(force_history[delay], 10.0, places=4)
-        self.assertAlmostEqual(force_history[delay + 1], 20.0, places=4)
+        graph_forces = control_graph.joint_f.numpy()
+        np.testing.assert_array_almost_equal(
+            graph_forces, ref_forces, decimal=5, err_msg="Graph-captured step should match eager step"
+        )
 
 
 if __name__ == "__main__":
diff --git a/pyproject.toml b/pyproject.toml
index e2005adcc0..dba8f0a5f5 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -38,7 +38,7 @@ sim = [
     # MuJoCo simulation
     "mujoco-warp>=3.5.0.2; python_version < '3.14'",  # MuJoCo, warp-accelerated (depends on mujoco which has no 3.14 wheels yet)
     "mujoco>=3.5.0; python_version < '3.14'",        # MuJoCo physics engine (no 3.14 wheels yet)
-    "newton-actuators>=0.1.0",
+    "newton-actuators>=0.1.0",  # deprecated — kept for backward compat, will be removed
 ]
 
 # Asset import and mesh processing dependencies
diff --git a/uv.lock b/uv.lock
index 6873f6e824..bc03587f31 100644
--- a/uv.lock
+++ b/uv.lock
@@ -5555,27 +5555,27 @@ dependencies = [
     { name = "typing-extensions" },
 ]
 wheels = [
-    { url = "https://download.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp310-cp310-manylinux_2_28_aarch64.whl", hash = "sha256:e186f57ef1de1aa877943259819468fc6f27efb583b4a91f9215ada7b7f4e6cc" },
-    { url = "https://download.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp310-cp310-manylinux_2_28_x86_64.whl", hash = "sha256:36368507b56eaa51acbd3c96ac8893bb9a86991ffcd0699fea3a1a74a2b8bdcb" },
-    { url = "https://download.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp310-cp310-win_amd64.whl", hash = "sha256:14d2831b9292c3a9b0d80116451315a08ffe8db745d403d06000bc47165b1f9e" },
-    { url = "https://download.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:85ed7944655ea6fd69377692e9cbfd7bba28d99696ceae79985e7caa99cf0a95" },
-    { url = "https://download.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:1d01ffaebf64715c0f507a39463149cb19e596ff702bd4bcf862601f2881dabc" },
-    { url = "https://download.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp311-cp311-win_amd64.whl", hash = "sha256:3523fda6e2cfab2b04ae09b1424681358e508bb3faa11ceb67004113d5e7acad" },
-    { url = "https://download.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:6f09cdf2415516be028ae82e6b985bcfc3eac37bc52ab401142689f6224516ca" },
-    { url = "https://download.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:628e89bd5110ced7debee2a57c69959725b7fbc64eab81a39dd70e46c7e28ba5" },
-    { url = "https://download.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp312-cp312-win_amd64.whl", hash = "sha256:fbde8f6a9ec8c76979a0d14df21c10b9e5cab6f0d106a73ca73e2179bc597cae" },
-    { url = "https://download.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:bdbcc703382f948e951c063448c9406bf38ce66c41dd698d9e2733fcf96c037a" },
-    { url = "https://download.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:7b4bd23ed63de97456fcc81c26fea9f02ee02ce1112111c4dac0d8cfe574b23e" },
-    { url = "https://download.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp313-cp313-win_amd64.whl", hash = "sha256:4d1b0b49c54223c7c04050b49eac141d77b6edbc34aea1dfc74a6fdb661baa8c" },
-    { url = "https://download.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:f1f8b840c64b645a4bc61a393db48effb9c92b2dc26c8373873911f0750d1ea7" },
-    { url = "https://download.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:23f58258012bcf1c349cb22af387e33aadca7f83ea617b080e774eb41e4fe8ff" },
-    { url = "https://download.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp313-cp313t-win_amd64.whl", hash = "sha256:01b216e097b17a5277cfb47c383cdcacf06abeadcb0daca0c76b59e72854c3b6" },
-    { url = "https://download.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp314-cp314-manylinux_2_28_aarch64.whl", hash = "sha256:c42377bc2607e3e1c60da71b792fb507c3938c87fd6edab8b21c59c91473c36d" },
-    { url = "https://download.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp314-cp314-manylinux_2_28_x86_64.whl", hash = "sha256:37d71feea068776855686a1512058df3f19f6f040a151f055aa746601678744f" },
-    { url = "https://download.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp314-cp314-win_amd64.whl", hash = "sha256:c57017ca29e62271e362fdeee7d20070e254755a5148b30b553d8a10fc83c7ef" },
-    { url = "https://download.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:777461f50b2daf77e4bdd8e2ad34bdfc5a993bf1bdf2ab9ef39f5edfe4e9c12b" },
-    { url = "https://download.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:7bcba6a7c5f0987a13298b1ca843155dcceceac758fa3c7ccd5c7af4059a1080" },
-    { url = "https://download.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp314-cp314t-win_amd64.whl", hash = "sha256:70d89143c956389d4806cb4e5fe0b1129fe0db280e1073288d17fa76c101cba4" },
+    { url = "https://download-r2.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp310-cp310-manylinux_2_28_aarch64.whl", hash = "sha256:e186f57ef1de1aa877943259819468fc6f27efb583b4a91f9215ada7b7f4e6cc" },
+    { url = "https://download-r2.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp310-cp310-manylinux_2_28_x86_64.whl", hash = "sha256:36368507b56eaa51acbd3c96ac8893bb9a86991ffcd0699fea3a1a74a2b8bdcb" },
+    { url = "https://download-r2.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp310-cp310-win_amd64.whl", hash = "sha256:14d2831b9292c3a9b0d80116451315a08ffe8db745d403d06000bc47165b1f9e" },
+    { url = "https://download-r2.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:85ed7944655ea6fd69377692e9cbfd7bba28d99696ceae79985e7caa99cf0a95" },
+    { url = "https://download-r2.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:1d01ffaebf64715c0f507a39463149cb19e596ff702bd4bcf862601f2881dabc" },
+    { url = "https://download-r2.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp311-cp311-win_amd64.whl", hash = "sha256:3523fda6e2cfab2b04ae09b1424681358e508bb3faa11ceb67004113d5e7acad" },
+    { url = "https://download-r2.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:6f09cdf2415516be028ae82e6b985bcfc3eac37bc52ab401142689f6224516ca" },
+    { url = "https://download-r2.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:628e89bd5110ced7debee2a57c69959725b7fbc64eab81a39dd70e46c7e28ba5" },
+    { url = "https://download-r2.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp312-cp312-win_amd64.whl", hash = "sha256:fbde8f6a9ec8c76979a0d14df21c10b9e5cab6f0d106a73ca73e2179bc597cae" },
+    { url = "https://download-r2.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:bdbcc703382f948e951c063448c9406bf38ce66c41dd698d9e2733fcf96c037a" },
+    { url = "https://download-r2.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:7b4bd23ed63de97456fcc81c26fea9f02ee02ce1112111c4dac0d8cfe574b23e" },
+    { url = "https://download-r2.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp313-cp313-win_amd64.whl", hash = "sha256:4d1b0b49c54223c7c04050b49eac141d77b6edbc34aea1dfc74a6fdb661baa8c" },
+    { url = "https://download-r2.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:f1f8b840c64b645a4bc61a393db48effb9c92b2dc26c8373873911f0750d1ea7" },
+    { url = "https://download-r2.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:23f58258012bcf1c349cb22af387e33aadca7f83ea617b080e774eb41e4fe8ff" },
+    { url = "https://download-r2.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp313-cp313t-win_amd64.whl", hash = "sha256:01b216e097b17a5277cfb47c383cdcacf06abeadcb0daca0c76b59e72854c3b6" },
+    { url = "https://download-r2.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp314-cp314-manylinux_2_28_aarch64.whl", hash = "sha256:c42377bc2607e3e1c60da71b792fb507c3938c87fd6edab8b21c59c91473c36d" },
+    { url = "https://download-r2.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp314-cp314-manylinux_2_28_x86_64.whl", hash = "sha256:37d71feea068776855686a1512058df3f19f6f040a151f055aa746601678744f" },
+    { url = "https://download-r2.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp314-cp314-win_amd64.whl", hash = "sha256:c57017ca29e62271e362fdeee7d20070e254755a5148b30b553d8a10fc83c7ef" },
+    { url = "https://download-r2.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:777461f50b2daf77e4bdd8e2ad34bdfc5a993bf1bdf2ab9ef39f5edfe4e9c12b" },
+    { url = "https://download-r2.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:7bcba6a7c5f0987a13298b1ca843155dcceceac758fa3c7ccd5c7af4059a1080" },
+    { url = "https://download-r2.pytorch.org/whl/cu128/torch-2.10.0%2Bcu128-cp314-cp314t-win_amd64.whl", hash = "sha256:70d89143c956389d4806cb4e5fe0b1129fe0db280e1073288d17fa76c101cba4" },
 ]
 
 [[package]]
@@ -5622,27 +5622,27 @@ dependencies = [
     { name = "typing-extensions" },
 ]
 wheels = [
-    { url = "https://download.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp310-cp310-manylinux_2_28_aarch64.whl", hash = "sha256:adfb2bca94f12a5baca6ccf9aa40d6c5b1259748ebb38938be670b07a24d4e15" },
-    { url = "https://download.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp310-cp310-manylinux_2_28_x86_64.whl", hash = "sha256:0f8566d1b3d8353b7f18d5a15375a00dae89d886bb7d01143bb394f475832286" },
-    { url = "https://download.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp310-cp310-win_amd64.whl", hash = "sha256:0de43ec229a78a2e80ae8578252f9eb14f898457d1cf2be81c8c91a4108c6c79" },
-    { url = "https://download.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:ea3239d544b2e569a8f47db5c7fa4fd42a2fe96aefb84bb1eda45ce213020fd2" },
-    { url = "https://download.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:22cfa45e73f1e8c64f4012737987a727d01d152121b93d196b0ca22f39a3f8e3" },
-    { url = "https://download.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp311-cp311-win_amd64.whl", hash = "sha256:218ae0f323d5ebe8f2770e46cbfb7bbff9af2c8d192d5187878d0964d43c8b71" },
-    { url = "https://download.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:4fc8f67637f4c92b989a07d80ffe755e79a3510ca02ebf23ce66396fb277c88d" },
-    { url = "https://download.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:858f0cbcc78d726fea9499eb3464faa98392fa093845a3262209bd226b7844d6" },
-    { url = "https://download.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp312-cp312-win_amd64.whl", hash = "sha256:224649fa0ab181ec483cc368e3303dda1760e4ba31bea806b88979f855436aaa" },
-    { url = "https://download.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:75780283308df9fede371eeda01e9607c8862a1803a2f2f31a08a2c0deaed342" },
-    { url = "https://download.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:7e0d9922e9e91f780b2761a0c5ebac3c15c9740bab042e1b59149afa6d6474eb" },
-    { url = "https://download.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp313-cp313-win_amd64.whl", hash = "sha256:48af94af745a9dd9b42be81ea15b56aba981666bcfe10394dceca6d9476a50fa" },
-    { url = "https://download.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:46699da91f0367d8dfa1b606cb0352aaf190b5853f463010e75ff08f15a94e7d" },
-    { url = "https://download.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:775d1fff07e302fb669d555a5005f781aa460aa80dff7a512e8e6e723f9def83" },
-    { url = "https://download.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp313-cp313t-win_amd64.whl", hash = "sha256:b38e5b505b015903a51c2b3f12e50a9f152f92fe7e3992e79f504138cf90601d" },
-    { url = "https://download.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp314-cp314-manylinux_2_28_aarch64.whl", hash = "sha256:18f87ae628c02f095f2e97756e4fa249ceef6ed6e87d5a3c79b5338abf842511" },
-    { url = "https://download.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp314-cp314-manylinux_2_28_x86_64.whl", hash = "sha256:db5a61791b7da3c1aa5a496e64cd72dbd4ef3ef2cbb69680fd45dc255b0da2f3" },
-    { url = "https://download.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp314-cp314-win_amd64.whl", hash = "sha256:dda35d473dd34cafa0668be176b9ad2cb69b1ff570d0336715a6541e89e27640" },
-    { url = "https://download.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:7781235583ea06b214075c10fa95f83b9805f06af44efc6e9946808413cff94f" },
-    { url = "https://download.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:526b737db11d632281795484ec729baae5f193a5a0d76a1f7d822f7897c8b4f5" },
-    { url = "https://download.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp314-cp314t-win_amd64.whl", hash = "sha256:d5ea18790a18b660d655f6e75a8ca6e8d6298b55fc338f8c921764b94c886743" },
+    { url = "https://download-r2.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp310-cp310-manylinux_2_28_aarch64.whl", hash = "sha256:adfb2bca94f12a5baca6ccf9aa40d6c5b1259748ebb38938be670b07a24d4e15" },
+    { url = "https://download-r2.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp310-cp310-manylinux_2_28_x86_64.whl", hash = "sha256:0f8566d1b3d8353b7f18d5a15375a00dae89d886bb7d01143bb394f475832286" },
+    { url = "https://download-r2.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp310-cp310-win_amd64.whl", hash = "sha256:0de43ec229a78a2e80ae8578252f9eb14f898457d1cf2be81c8c91a4108c6c79" },
+    { url = "https://download-r2.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp311-cp311-manylinux_2_28_aarch64.whl", hash = "sha256:ea3239d544b2e569a8f47db5c7fa4fd42a2fe96aefb84bb1eda45ce213020fd2" },
+    { url = "https://download-r2.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp311-cp311-manylinux_2_28_x86_64.whl", hash = "sha256:22cfa45e73f1e8c64f4012737987a727d01d152121b93d196b0ca22f39a3f8e3" },
+    { url = "https://download-r2.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp311-cp311-win_amd64.whl", hash = "sha256:218ae0f323d5ebe8f2770e46cbfb7bbff9af2c8d192d5187878d0964d43c8b71" },
+    { url = "https://download-r2.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:4fc8f67637f4c92b989a07d80ffe755e79a3510ca02ebf23ce66396fb277c88d" },
+    { url = "https://download-r2.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:858f0cbcc78d726fea9499eb3464faa98392fa093845a3262209bd226b7844d6" },
+    { url = "https://download-r2.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp312-cp312-win_amd64.whl", hash = "sha256:224649fa0ab181ec483cc368e3303dda1760e4ba31bea806b88979f855436aaa" },
+    { url = "https://download-r2.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:75780283308df9fede371eeda01e9607c8862a1803a2f2f31a08a2c0deaed342" },
+    { url = "https://download-r2.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:7e0d9922e9e91f780b2761a0c5ebac3c15c9740bab042e1b59149afa6d6474eb" },
+    { url = "https://download-r2.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp313-cp313-win_amd64.whl", hash = "sha256:48af94af745a9dd9b42be81ea15b56aba981666bcfe10394dceca6d9476a50fa" },
+    { url = "https://download-r2.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:46699da91f0367d8dfa1b606cb0352aaf190b5853f463010e75ff08f15a94e7d" },
+    { url = "https://download-r2.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:775d1fff07e302fb669d555a5005f781aa460aa80dff7a512e8e6e723f9def83" },
+    { url = "https://download-r2.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp313-cp313t-win_amd64.whl", hash = "sha256:b38e5b505b015903a51c2b3f12e50a9f152f92fe7e3992e79f504138cf90601d" },
+    { url = "https://download-r2.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp314-cp314-manylinux_2_28_aarch64.whl", hash = "sha256:18f87ae628c02f095f2e97756e4fa249ceef6ed6e87d5a3c79b5338abf842511" },
+    { url = "https://download-r2.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp314-cp314-manylinux_2_28_x86_64.whl", hash = "sha256:db5a61791b7da3c1aa5a496e64cd72dbd4ef3ef2cbb69680fd45dc255b0da2f3" },
+    { url = "https://download-r2.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp314-cp314-win_amd64.whl", hash = "sha256:dda35d473dd34cafa0668be176b9ad2cb69b1ff570d0336715a6541e89e27640" },
+    { url = "https://download-r2.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp314-cp314t-manylinux_2_28_aarch64.whl", hash = "sha256:7781235583ea06b214075c10fa95f83b9805f06af44efc6e9946808413cff94f" },
+    { url = "https://download-r2.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp314-cp314t-manylinux_2_28_x86_64.whl", hash = "sha256:526b737db11d632281795484ec729baae5f193a5a0d76a1f7d822f7897c8b4f5" },
+    { url = "https://download-r2.pytorch.org/whl/cu130/torch-2.10.0%2Bcu130-cp314-cp314t-win_amd64.whl", hash = "sha256:d5ea18790a18b660d655f6e75a8ca6e8d6298b55fc338f8c921764b94c886743" },
 ]
 
 [[package]]