Fix multi-world joint indexing in warp-to-mj coord conversion#2332
Fix multi-world joint indexing in warp-to-mj coord conversion#2332jsw7460 wants to merge 5 commits intonewton-physics:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughFix per-world joint indexing in the MuJoCo kernel so joint metadata lookups use a global per-world joint index; add regression tests that verify replicated worlds with per-world body mass/COM yield consistent physics; record the fix in the changelog. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
newton/tests/test_multiworld_body_properties.py (1)
114-121: Remove unused variablebodies_per_worldin the reference loop.Line 117 computes
bodies_per_worldbut it's never used within the loop. For single-world models, direct indexing withbody_mass[0]is correct.♻️ Remove unused variable
for mass_val, label in [(mass_a, "A"), (mass_b, "B")]: model = self._build_articulated_model(num_worlds=1, device=device) - bodies_per_world = model.body_count // model.world_count body_mass = model.body_mass.numpy() body_mass[0] = mass_val # base body model.body_mass.assign(body_mass) ref_q[label] = self._run_sim(model)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@newton/tests/test_multiworld_body_properties.py` around lines 114 - 121, Remove the unused local variable bodies_per_world from the reference loop in test_multiworld_body_properties.py: inside the loop that builds the model (see _build_articulated_model and use of model.body_mass), delete the computation bodies_per_world = model.body_count // model.world_count since it is never used; leave the rest of the loop (setting body_mass[0] and model.body_mass.assign(...), and calling self._run_sim(model) to populate ref_q) unchanged.newton/_src/solvers/mujoco/kernels.py (1)
578-580: Fix is correct; consider extracting the global index for consistency with the sibling kernel.The change correctly uses the global joint index for
joint_typeandjoint_childlookups, aligning with howjoint_q_startandjoint_qd_startare already indexed.For consistency with
convert_mj_coords_to_warp_kernel(line 453), consider precomputing the global index once:♻️ Optional: Extract global index to match sibling kernel pattern
worldid, jntid = wp.tid() + joint_id = joints_per_world * worldid + jntid + # Skip loop joints — they have no MuJoCo qpos/qvel entries q_i = mj_q_start[jntid] if q_i < 0: return qd_i = mj_qd_start[jntid] - type = joint_type[joints_per_world * worldid + jntid] - wq_i = joint_q_start[joints_per_world * worldid + jntid] - wqd_i = joint_qd_start[joints_per_world * worldid + jntid] + type = joint_type[joint_id] + wq_i = joint_q_start[joint_id] + wqd_i = joint_qd_start[joint_id]And at line 610:
# Get CoM offset in world frame - child = joint_child[joints_per_world * worldid + jntid] + child = joint_child[joint_id]Also applies to: 610-611
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@newton/_src/solvers/mujoco/kernels.py` around lines 578 - 580, Extract the global joint index once (e.g., compute a local variable like global_jnt = joints_per_world * worldid + jntid) and use that variable for all array lookups instead of repeating the expression; update usages in the kernel where joint_type, joint_child, joint_q_start, and joint_qd_start are accessed (the occurrences around the current block and the similar lookups at the later spots around the 610-611 area) so indexing is consistent with convert_mj_coords_to_warp_kernel and avoids duplicated arithmetic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@newton/_src/solvers/mujoco/kernels.py`:
- Around line 578-580: Extract the global joint index once (e.g., compute a
local variable like global_jnt = joints_per_world * worldid + jntid) and use
that variable for all array lookups instead of repeating the expression; update
usages in the kernel where joint_type, joint_child, joint_q_start, and
joint_qd_start are accessed (the occurrences around the current block and the
similar lookups at the later spots around the 610-611 area) so indexing is
consistent with convert_mj_coords_to_warp_kernel and avoids duplicated
arithmetic.
In `@newton/tests/test_multiworld_body_properties.py`:
- Around line 114-121: Remove the unused local variable bodies_per_world from
the reference loop in test_multiworld_body_properties.py: inside the loop that
builds the model (see _build_articulated_model and use of model.body_mass),
delete the computation bodies_per_world = model.body_count // model.world_count
since it is never used; leave the rest of the loop (setting body_mass[0] and
model.body_mass.assign(...), and calling self._run_sim(model) to populate ref_q)
unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 47675d84-a083-42ea-bc7a-93c49c729551
📒 Files selected for processing (3)
CHANGELOG.mdnewton/_src/solvers/mujoco/kernels.pynewton/tests/test_multiworld_body_properties.py
❌ 1 Tests Failed:
View the top 1 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CHANGELOG.md`:
- Line 88: Edit the CHANGELOG.md entry so it’s a user-facing sentence (avoid
kernel/indexing internals) and move it to the end of the "Fixed" list under the
[Unreleased] section; replace the current line referencing
convert_warp_coords_to_mj_kernel and joint_type/joint_child with a concise
user-facing entry such as "Fix multi-world MuJoCo coordinate conversion using
the wrong body center of mass in replicated worlds" and append it as the last
item in the Fixed section (do not include implementation details like per-world
vs global indices or function symbols).
In `@newton/tests/test_multiworld_body_properties.py`:
- Line 1: The SPDX header in newton/tests/test_multiworld_body_properties.py
uses the wrong creation year (2025); update the SPDX copyright line by replacing
"2025" with "2026" so it reads the correct creation year (2026).
- Around line 135-145: The quaternion comparisons for free joints are
sign-ambiguous; implement a helper named _assert_joint_q_close (place it near
_run_sim()) that compares position parts ([:3]) and tail parts ([7:]) with
np.testing.assert_allclose, but for the quaternion slice [3:7] flips actual when
np.dot(actual_quat, expected_quat) < 0 before asserting; then replace the two
direct np.testing.assert_allclose calls that compare multi_q[0]/multi_q[1] to
ref_q["A"]/ref_q["B"] (and the similar block at lines 176-186) with calls to
_assert_joint_q_close(actual=multi_q[i], expected=ref_q[...], atol=1e-4,
err_msg=...) so q and -q are treated as equivalent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 3a7c4321-241a-4663-b113-2980d40dd0ad
📒 Files selected for processing (2)
CHANGELOG.mdnewton/tests/test_multiworld_body_properties.py
| np.testing.assert_allclose( | ||
| multi_q[0], | ||
| ref_q["A"], | ||
| atol=1e-4, | ||
| err_msg="World 0 (mass_A) diverges from single-world reference", | ||
| ) | ||
| np.testing.assert_allclose( | ||
| multi_q[1], | ||
| ref_q["B"], | ||
| atol=1e-4, | ||
| err_msg="World 1 (mass_B) diverges from single-world reference", |
There was a problem hiding this comment.
Make these joint_q comparisons quaternion-sign invariant.
Both assertion blocks compare the free-joint quaternion verbatim, so q and -q would fail here even though they represent the same pose. That makes the regression vulnerable to false negatives.
🧪 Suggested change
- np.testing.assert_allclose(
+ self._assert_joint_q_close(
multi_q[0],
ref_q["A"],
atol=1e-4,
err_msg="World 0 (mass_A) diverges from single-world reference",
)
- np.testing.assert_allclose(
+ self._assert_joint_q_close(
multi_q[1],
ref_q["B"],
atol=1e-4,
err_msg="World 1 (mass_B) diverges from single-world reference",
)Add a helper once, e.g. near _run_sim():
def _assert_joint_q_close(self, actual, expected, *, atol: float, err_msg: str) -> None:
np.testing.assert_allclose(actual[:3], expected[:3], atol=atol, err_msg=err_msg)
actual_quat = actual[3:7].copy()
expected_quat = expected[3:7]
if np.dot(actual_quat, expected_quat) < 0.0:
actual_quat *= -1.0
np.testing.assert_allclose(actual_quat, expected_quat, atol=atol, err_msg=err_msg)
np.testing.assert_allclose(actual[7:], expected[7:], atol=atol, err_msg=err_msg)Based on learnings, “Quaternions in builder/test assertions use Warp’s XYZW convention ... prefer identity/non-identity checks that account for q and -q equivalence.”
Also applies to: 176-186
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@newton/tests/test_multiworld_body_properties.py` around lines 135 - 145, The
quaternion comparisons for free joints are sign-ambiguous; implement a helper
named _assert_joint_q_close (place it near _run_sim()) that compares position
parts ([:3]) and tail parts ([7:]) with np.testing.assert_allclose, but for the
quaternion slice [3:7] flips actual when np.dot(actual_quat, expected_quat) < 0
before asserting; then replace the two direct np.testing.assert_allclose calls
that compare multi_q[0]/multi_q[1] to ref_q["A"]/ref_q["B"] (and the similar
block at lines 176-186) with calls to _assert_joint_q_close(actual=multi_q[i],
expected=ref_q[...], atol=1e-4, err_msg=...) so q and -q are treated as
equivalent.
|
🟡 Nice fix for the multi-world joint indexing in the coordinate conversion kernels. One more instance of the same per-world indexing pattern was missed in the forward kernel, and the new mass regression test doesn't fully exercise the code path it targets. 🟡 In axis_count = joint_dof_dim[jntid, 0] + joint_dof_dim[jntid, 1]The reverse kernel ( axis_count = joint_dof_dim[joint_id, 0] + joint_dof_dim[joint_id, 1]Today this is functionally benign because Suggested fix: axis_count = joint_dof_dim[joints_per_world * worldid + jntid, 0] + joint_dof_dim[joints_per_world * worldid + jntid, 1] |
| return state_0.joint_q.numpy() | ||
|
|
||
|
|
||
| def test_perworld_mass_produces_consistent_physics(test, device): |
There was a problem hiding this comment.
⚪ Both links in _build_articulated_model use com=(0, 0, 0), so the cross-product term in v_origin = v_com - ω × com_world vanishes no matter which joint_child index is read. This means the test would pass both before and after the indexing fix.
The companion test_perworld_com_produces_consistent_physics is the test that actually exercises the CoM-lookup regression. Additionally, test_randomize_body_mass in test_mujoco_solver.py already covers per-world mass propagation into MuJoCo.
Consider either:
- Giving the base link a non-zero CoM (e.g.
com=wp.vec3(0.02, 0.0, -0.01)) so varying mass also stresses the CoM-lookup path, or - Removing this test if the existing coverage is sufficient, saving ~300 simulation steps.
867c6a0 to
8002c98
Compare
|
Thanks for the review! Both comments addressed:
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
newton/_src/solvers/mujoco/kernels.py (2)
573-645:⚠️ Potential issue | 🟠 MajorApply the same global joint indexing in
convert_qfrc_actuator_from_mj_kernel.This hunk fixes the CoM lookup in
convert_warp_coords_to_mj_kernel, but the laterconvert_qfrc_actuator_from_mj_kernelstill indexes joint-indexed arrays with barejntid. Forworldid > 0, its FREE-joint branch will still use world 0'sjoint_child/body_comwhen shifting actuator torques, so per-world inertial-property updates remain wrong on that path.Suggested follow-up
def convert_qfrc_actuator_from_mj_kernel( ... ): worldid, jntid = wp.tid() + joint_id = joints_per_world * worldid + jntid # Skip loop joints — they have no MuJoCo DOF entries q_i = mj_q_start[jntid] if q_i < 0: return qd_i = mj_qd_start[jntid] wqd_i = joint_qd_start[joints_per_world * worldid + jntid] - type = joint_type[jntid] + jtype = joint_type[joint_id] - if type == JointType.FREE: + if jtype == JointType.FREE: ... - child = joint_child[jntid] + child = joint_child[joint_id] com_world = wp.quat_rotate(rot, body_com[child]) ... else: - axis_count = joint_dof_dim[jntid, 0] + joint_dof_dim[jntid, 1] + axis_count = joint_dof_dim[joint_id, 0] + joint_dof_dim[joint_id, 1]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@newton/_src/solvers/mujoco/kernels.py` around lines 573 - 645, The convert_qfrc_actuator_from_mj_kernel uses bare jntid to index joint-indexed arrays (e.g., joint_child, body_com) causing wrong per-world lookups when worldid > 0; compute joint_id = joints_per_world * worldid + jntid (same as in convert_warp_coords_to_mj_kernel) and use joint_id for all accesses to joint_child, joint_type, joint_q_start, joint_qd_start, body_com, joint_dof_dim, etc., ensuring FREE-joint branches and actuator torque shifts reference the per-world joint entries rather than world-0 entries.
581-645:⚠️ Potential issue | 🟡 MinorRename
typetojtypeto avoid Ruff A001 (builtin shadowing).Line 581 shadows Python's builtin
type. Ruff rule A001 is enabled (via ruff.toml), andjtypeis the consistent naming convention used throughout the same file for similar variables.Minimal rename
- type = joint_type[joint_id] + jtype = joint_type[joint_id] ... - if type == JointType.FREE: + if jtype == JointType.FREE: ... - elif type == JointType.BALL: + elif jtype == JointType.BALL:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@newton/_src/solvers/mujoco/kernels.py` around lines 581 - 645, The local variable named type shadows Python builtin type and should be renamed to jtype: locate the assignment "type = joint_type[joint_id]" and replace it with "jtype = joint_type[joint_id]", then update all subsequent uses in this code block (e.g., the conditionals "type == JointType.FREE", "type == JointType.BALL", and any other references) to use jtype; ensure you also update any other occurrences of this same local variable within the same function so naming is consistent with the file's convention (use symbols joint_type, joint_id, joint_q_start, joint_q, joint_qd_start, joint_qd, JointType).
♻️ Duplicate comments (2)
newton/tests/test_multiworld_body_properties.py (2)
1-1:⚠️ Potential issue | 🟡 MinorUpdate the SPDX header to the 2026 creation year.
This file is new in the current PR, so the header should use
2026, not2025.As per coding guidelines, “In SPDX copyright lines, use the year the file was first created. Do not create date ranges or update the year when modifying a file.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@newton/tests/test_multiworld_body_properties.py` at line 1, Update the SPDX copyright header line in this new file by changing the creation year from 2025 to 2026: modify the SPDX line (the top-of-file comment starting with "SPDX-FileCopyrightText") to use "2026" and do not create a date range; ensure no other SPDX lines or ranges are added.
134-145:⚠️ Potential issue | 🟡 MinorMake the
joint_qcomparisons quaternion-sign invariant.These assertions still compare the free-joint quaternion verbatim.
qand-qencode the same pose, so this regression can fail even when the multi-world and single-world trajectories match. Please route them through a helper that flips one side when the quaternion dot product is negative.Based on learnings, “Quaternions in builder/test assertions use Warp’s XYZW convention (identity [0, 0, 0, 1]). In newton/tests/test_import_mjcf.py (e.g., test_fit_box_to_mesh_inertia_rotated), prefer identity/non-identity checks that account for q and -q equivalence (e.g., via np.abs on components).”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@newton/tests/test_multiworld_body_properties.py` around lines 134 - 145, The quaternion comparisons between multi_q[...] and ref_q[...] need to be made sign-invariant: implement or call a helper (e.g., align_quaternion_sign(quat_ref, quat_test) or flip_quat_if_needed) that computes the dot product between the two 4D quaternions and multiplies one by -1 when the dot is negative, then use the aligned quaternions in the np.testing.assert_allclose calls for multi_q[0] vs ref_q["A"] and multi_q[1] vs ref_q["B"] so q and -q are treated as equivalent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@newton/_src/solvers/mujoco/kernels.py`:
- Around line 573-645: The convert_qfrc_actuator_from_mj_kernel uses bare jntid
to index joint-indexed arrays (e.g., joint_child, body_com) causing wrong
per-world lookups when worldid > 0; compute joint_id = joints_per_world *
worldid + jntid (same as in convert_warp_coords_to_mj_kernel) and use joint_id
for all accesses to joint_child, joint_type, joint_q_start, joint_qd_start,
body_com, joint_dof_dim, etc., ensuring FREE-joint branches and actuator torque
shifts reference the per-world joint entries rather than world-0 entries.
- Around line 581-645: The local variable named type shadows Python builtin type
and should be renamed to jtype: locate the assignment "type =
joint_type[joint_id]" and replace it with "jtype = joint_type[joint_id]", then
update all subsequent uses in this code block (e.g., the conditionals "type ==
JointType.FREE", "type == JointType.BALL", and any other references) to use
jtype; ensure you also update any other occurrences of this same local variable
within the same function so naming is consistent with the file's convention (use
symbols joint_type, joint_id, joint_q_start, joint_q, joint_qd_start, joint_qd,
JointType).
---
Duplicate comments:
In `@newton/tests/test_multiworld_body_properties.py`:
- Line 1: Update the SPDX copyright header line in this new file by changing the
creation year from 2025 to 2026: modify the SPDX line (the top-of-file comment
starting with "SPDX-FileCopyrightText") to use "2026" and do not create a date
range; ensure no other SPDX lines or ranges are added.
- Around line 134-145: The quaternion comparisons between multi_q[...] and
ref_q[...] need to be made sign-invariant: implement or call a helper (e.g.,
align_quaternion_sign(quat_ref, quat_test) or flip_quat_if_needed) that computes
the dot product between the two 4D quaternions and multiplies one by -1 when the
dot is negative, then use the aligned quaternions in the
np.testing.assert_allclose calls for multi_q[0] vs ref_q["A"] and multi_q[1] vs
ref_q["B"] so q and -q are treated as equivalent.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 1ecaeed4-5dd2-4ca8-be28-91f1834c6096
📒 Files selected for processing (3)
CHANGELOG.mdnewton/_src/solvers/mujoco/kernels.pynewton/tests/test_multiworld_body_properties.py
✅ Files skipped from review due to trivial changes (1)
- CHANGELOG.md
adenzler-nvidia
left a comment
There was a problem hiding this comment.
Thanks for the fix — the core indexing change in kernels.py looks correct. A few things still need attention before this can land (the test device issue is the main CI blocker).
| """Verify that multi-world simulations with per-world body mass/CoM | ||
| produce physically consistent results across all worlds.""" | ||
|
|
||
| def _build_articulated_model(self, num_worlds: int = 2, device: str = "cuda:0"): |
There was a problem hiding this comment.
Hardcoding device="cuda:0" breaks all CPU-only CI runners (ValueError: Invalid device identifier: cuda:0) — this is the sole test failure across ubuntu, ubuntu-arm, macos, and windows.
Please use the get_test_devices() / add_function_test pattern (like test_mujoco_solver.py and other tests in this repo) so the test auto-skips when CUDA is unavailable. An earlier commit (867c6a0) had this conversion but it was lost on the force-push.
| # Set initial height and angular velocity for all worlds. | ||
| # Non-zero angular velocity is critical: the bug in | ||
| # convert_warp_coords_to_mj_kernel affects the velocity | ||
| # conversion v_origin = v_com - ω × com. With ω=0 the cross |
There was a problem hiding this comment.
Unicode × (MULTIPLICATION SIGN) triggers Ruff RUF003, which is what fails pre-commit.ci. Use ASCII x instead.
| # conversion v_origin = v_com - ω × com. With ω=0 the cross | |
| # convert_warp_coords_to_mj_kernel affects the velocity | |
| # conversion v_origin = v_com - w x com. With w=0 the cross |
| ### Fixed | ||
|
|
||
| - Fix GL viewer crash when enabling "Gap + Margin" for soft-body-only states with no rigid body transforms | ||
| - Fix incorrect joint indexing in `convert_warp_coords_to_mj_kernel` that used per-world index instead of global index for `joint_type` and `joint_child` lookups, causing wrong CoM references for `worldid > 0` |
There was a problem hiding this comment.
Per repo guidelines, changelog entries should be user-facing and avoid internal implementation details. This references kernel names, parameter names, and indexing mechanics. Something like:
| - Fix incorrect joint indexing in `convert_warp_coords_to_mj_kernel` that used per-world index instead of global index for `joint_type` and `joint_child` lookups, causing wrong CoM references for `worldid > 0` | |
| - Fix multi-world coordinate conversion using the wrong body center of mass for replicated worlds |
| type = joint_type[jntid] | ||
| wq_i = joint_q_start[joints_per_world * worldid + jntid] | ||
| wqd_i = joint_qd_start[joints_per_world * worldid + jntid] | ||
| type = joint_type[joint_id] |
There was a problem hiding this comment.
Nit: type shadows the Python builtin (Ruff A001, enabled in this repo's ruff.toml). The rest of this file uses jtype for the same purpose — rename for consistency?
| type = joint_type[joint_id] | |
| jtype = joint_type[joint_id] |
(Would need to update the if type == checks below as well.)
cf205cf to
1606d5a
Compare
adenzler-nvidia
left a comment
There was a problem hiding this comment.
Thanks for addressing all the feedback! One more thing: the test should also run on CPU.
Sibling kernel bug filed as #2471.
convert_qfrc_actuator_from_mj_kernel indexed joint_type, joint_child, and joint_dof_dim with per-world jntid instead of the global index (joints_per_world * worldid + jntid). For joint_child this caused body_com lookups to always read world 0's CoM when computing the free-joint force transform, producing wrong angular forces for worldid > 0. The same class of bug was reported for convert_warp_coords_to_mj_kernel in newton-physics#2332. Closes newton-physics#2471
6a32136 to
7a572a3
Compare
| @@ -0,0 +1,167 @@ | |||
| # SPDX-FileCopyrightText: Copyright (c) 2026 The Newton Developers | |||
There was a problem hiding this comment.
A few questions about the test scope — I wonder if it can be tightened.
The bug causes world-N to silently read world-0's per-world-replicated data. So with per-world-differing body properties and identical initial joint state, buggy code should produce identical trajectories across worlds while correct code produces divergent ones. Given that:
-
Do the single-world reference runs add anything beyond "world 0 and world 1 diverge"? It seems like a single multi-world run with a
not np.allclose(multi_q[0], multi_q[1])assertion would catch the same regression at 1/3 the cost. -
The comment "Using multiple joints/bodies is essential because the indexing bug only manifests when joint_child maps to different bodies across the flat array" — is this actually true? For any
worldid > 0, buggy indexing reads slotjntidinstead ofjoints_per_world * worldid + jntid, so the bug should fire with 1 joint per world as long as body properties differ across worlds. Am I missing something about why the child revolute joint is load-bearing here? -
The
body_com.numpy().reshape(...).assign(...)patching post-finalize()(with theBODY_INERTIAL_PROPERTIESnotify) — is there a reason to do it this way rather than building each world with its target CoM directly viaadd_builder? The post-hoc mutation seems to be a consequence of usingreplicate, which forces identical worlds.
There was a problem hiding this comment.
@adenzler-nvidia
Thanks for the review.
After implementing the assertFalse(allclose(multi_q[0], multi_q[1])) check, I found it doesn't actually detect the bug. Wanted to walk through why in case I'm missing something.
Root cause. notify_model_changed(BODY_INERTIAL_PROPERTIES) syncs
model.body_com into the per-world mjw_model.body_ipos array (see _update_model_inertial_properties launching update_body_mass_ipos_kernel). So MuJoCo's internal dynamics always use the correct per-world CoM, independent of the conversion kernel.
The kernel's role is the Newton <-> MuJoCo velocity convention transform v_origin = v_com - ω × rotate(rot, body_com[child]). With the bug, it makes v_origin identical across worlds (when initial v_com / ω / rot are set identically in the test), but MuJoCo then simulates each world with its own per-world body_ipos. Identical initial v_origin + different body_ipos => different dynamics. Trajectories diverge in the buggy case too.
The options I'm considering is either
- Keep the current test as-is.
- Round-trip test at the kernel level:
joint_qd → warp_to_mj → mj_to_warpshould be identity; with the bug it drifts byω × rotate(delta_com)per application. It has no stepping, and directly targets the kernel matth, but calls internal conversion methods rather than the public solver surface.
Any preference?
For your second and third points, I agree. Will modify it after we decide (1).
There was a problem hiding this comment.
You're right — (1) is the right call, and my original (1) suggestion doesn't work. I missed that notify_model_changed(BODY_INERTIAL_PROPERTIES) syncs per-world body_ipos through a separately (and correctly) indexed kernel, so MuJoCo steps with the right per-world inertial state regardless of the conversion bug. Trajectories diverge either way, just to the wrong places in the buggy case. Apologies for the noise.
Between your two options, I'd go with (1). It tests the externally-observable behavior through the public solver surface, which is what we actually want to guard against regressing. (2) is tighter but tests an internal invariant between two kernels — a symmetric break in both would pass.
Happy to proceed with (1) as-is. Could you also resolve the merge conflicts on this branch when you get a chance?
There was a problem hiding this comment.
Rebased onto latest main and simplified the test setup per points (2) and (3):
- Single free-floating body per world (removed the articulated revolute joint)
- Build each world directly via add_builder instead of replicate + post-hoc body_com mutation
Keeping the reference-based comparison as agreed in option (1).
convert_warp_coords_to_mj_kernel used per-world joint index for joint_type, joint_child, and joint_dof_dim lookups instead of the global index. This caused wrong body CoM references for worldid > 0. The sibling convert_mj_coords_to_warp_kernel already uses the global index. Extract joint_id = joints_per_world * worldid + jntid once and reuse it for all per-joint lookups. Rename the shadowed type local to jtype for consistency with other kernels in the file. Add regression test that exercises the CoM lookup path via per-world body_com overrides and non-zero initial angular velocity. The test uses get_test_devices() and auto-skips on CPU-only runners since the buggy code path only executes on CUDA (SolverMuJoCo with use_mujoco_cpu=False).
Reduce the test model to a single free-floating body per world (the articulated 2-link structure was not required to trigger the indexing bug). Build each world directly via add_builder with its target base CoM
7a572a3 to
66f2b01
Compare
Description
convert_warp_coords_to_mj_kernel used per-world joint index for joint_type and joint_child instead of the global index. This caused wrong body CoM lookups for worldid > 0. The sibling kernel convert_mj_coords_to_warp_kernel already uses the global index.
Checklist
CHANGELOG.mdhas been updated (if user-facing change)Test plan
test_perworld_com_produces_consistent_physicsFAILSBug fix
Steps to reproduce:
body_comper world viamodel.body_com+notify_model_changed(BODY_INERTIAL_PROPERTIES)convert_warp_coords_to_mj_kernelreads world 0'sjoint_childMinimal reproduction:
See
newton/tests/test_multiworld_body_properties.pySummary by CodeRabbit
Bug Fixes
Tests