Skip to content

Fixes compatibility issues with latest Isaac Sim updates#5358

Open
kellyguo11 wants to merge 33 commits intoisaac-sim:developfrom
kellyguo11:kellyg/fix-installation
Open

Fixes compatibility issues with latest Isaac Sim updates#5358
kellyguo11 wants to merge 33 commits intoisaac-sim:developfrom
kellyguo11:kellyg/fix-installation

Conversation

@kellyguo11
Copy link
Copy Markdown
Contributor

@kellyguo11 kellyguo11 commented Apr 22, 2026

Description

Fixes two regressions introduced by recent Isaac Sim develop builds (between 2026-04-13 and 2026-04-22) that broke the standard training entry point:

./isaaclab.sh -p scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Cartpole-v0

1. Isaac Sim SimulationManager invalidating the shared physics view

A recent Isaac Sim commit (8df6beeb0, 6.0.0-alpha.180) decorated SimulationManager._on_stop / _on_play / _on_stage_* with @staticmethod so they finally fire from their Carb subscriptions. The newly-working _on_stop calls SimulationManager.invalidate_physics(), which invalidates the shared omni.physics.tensors simulation view that PhysxManager and every articulation _root_view rely on, raising:

[Error] [omni.physx.tensors.plugin] Simulation view object is invalidated and cannot be used again to call getDofVelocities
Exception: Failed to get DOF velocities from backend

on the first scene.update() after sim.reset().

  • source/isaaclab_physx/isaaclab_physx/__init__.py: _patch_isaacsim_simulation_manager() now (a) force-imports isaacsim.core.simulation_manager so the patch always runs deterministically, (b) calls original_class.enable_all_default_callbacks(False) to tear down the original class's Carb subscriptions, and (c) only then swaps the module attribute. PhysxManager becomes the single owner of the simulation view's lifecycle.

2. PhysX tensors API module relocation

Recent Isaac Sim builds removed omni.physics.tensors.impl and now expose the PhysX Tensor API types directly under omni.physics.tensors.api. The old import raises ModuleNotFoundError at Isaac Lab import time, taking down every entry point that touches articulations, rigid objects, deformables, contact sensors, or ray caster sensors.

  • Updated all imports of the form import omni.physics.tensors.impl.api as physximport omni.physics.tensors.api as physx across isaaclab (ray-caster sensors) and isaaclab_physx (assets and contact sensor), including TYPE_CHECKING-gated imports.
  • Updated the PhysX Tensor API Sphinx cross-reference URLs in the articulation data classes and the example console output in the contact sensor / IMU docs.
  • Added a migration note to docs/source/migration/migrating_to_isaaclab_3-0.rst.

Fixes # (issue)

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Screenshots

N/A — runtime fixes with no UI surface.

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

Kelly Guo and others added 10 commits April 22, 2026 11:16
…broken links

Combined fixes from PR isaac-sim#5279 on top of deprecation migration from PR isaac-sim#5280:

1. install.py: Add prebundle probe functions that check pip_prebundle paths
   instead of extsDeprecated, and uninstall pip torch when prebundle shadows it
2. isaaclab/__init__.py: Deprioritize ml_archive/pip_prebundle on sys.path
   so pip-installed torch/numpy/etc take priority over prebundled copies
3. setup.py: Pin mujoco==3.5.0
4. Kit files: Remove omni.warp.core to prevent conflict with pip warp-lang
5. check-links.yml: Add ubuntu.com to link checker exclusions (returns 403)
6. New tests: test_install_commands.py, test_install_prebundle.py
- Add isaacsim.pip.newton and omni.warp.core to app.extensions.excluded
  in all 6 kit files to prevent Kit from loading these extensions
- Add isaacsim.pip.newton to _CONFLICTING_EXTS in __init__.py so its
  pip_prebundle paths are deprioritized on sys.path
- isaacsim.pip.newton prebundles newton 1.0.0 which shadows the
  pip-installed newton 1.1.0.dev0 needed by Isaac Lab
- omni.warp.core loads a bundled warp that conflicts with pip-installed
  warp-lang used by Isaac Lab
Remove the pinned SHA digest from ISAACSIM_BASE_VERSION so CI tests
against the most recent Isaac Sim nightly image, which includes the
pip_prebundle path changes that this PR fixes.
The previous approach filtered only omni.isaac.ml_archive and
isaacsim.pip.newton prebundle paths. On newer Isaac Sim nightly
images, additional or renamed extensions may also inject conflicting
packages via pip_prebundle. Match on 'pip_prebundle' in the path
alone to catch all current and future prebundle directories.
The renderer's render() method launches warp kernels that populate the
camera output tensors on the GPU. Without an explicit synchronization
barrier, downstream torch operations (e.g. torch.mean on camera data)
can execute on a different CUDA stream before the warp writes complete,
resulting in 'CUDA error: an illegal memory access was encountered'.

This was exposed by newer Isaac Sim nightly images that changed warp's
default CUDA stream behavior. Add wp.synchronize() in Camera's
_update_buffers_impl after the render pass to guarantee kernel
completion before any torch access.
The render() method launches a warp reshape kernel via wp.launch() then
immediately performs torch operations (depth clipping, inf assignment)
on the same output buffer. These run on different CUDA streams — warp's
stream vs torch's default stream — with no synchronization, causing
illegal memory access errors that corrupt the CUDA context.

Add wp.synchronize() after wp.launch() and before any torch post-
processing within the per-data-type loop. This complements the existing
wp.synchronize() in Camera._update_buffers_impl() which guards the
render→read_output boundary.
Extend _deprioritize_prebundle_paths() to also demote known conflicting
extension directories (omni.warp.core, omni.isaac.ml_archive, etc.)
in addition to pip_prebundle paths.  This prevents kit extensions that
bundle their own Python packages from shadowing pip-installed versions.

Also re-run the sanitization after SimulationApp + extensions load,
since Kit may insert new paths onto sys.path during startup that were
not present when isaaclab/__init__.py first ran.
The nightly Isaac Sim Docker (>= 04/14) ships warp 1.13.x inside
kit/python/lib/python3.12/site-packages. setup_python_env.sh places
that directory on PYTHONPATH ahead of pip's site-packages, so warp
1.13.x was imported despite the _pin_warp_import pre-import added in
4.6.8 — that pre-import was locking the kit warp, not the pip warp.

Warp 1.13.x deprecates warp.types.array with changed semantics.
omni.replicator.core accesses this symbol during rendering, triggering
CUDA error 700 (illegal memory access). That error poisons the CUDA
context: all subsequent warp kernel lookups return NULL, producing the
"Failed to find forward kernel 'update_outdated_envs_kernel'" crash.

Fix: extend _should_demote() to catch kit/python/lib/*/site-packages
paths so that pip-installed warp-lang always wins. Also add explicit
strip_hash=False to isaaclab.sensors.kernels as a defensive safety net,
and broaden the AppLauncher post-startup diagnostic to cover kit paths.
The 4.6.9 path-demotion fix moved kit/python/lib/python3.12/site-packages
to the end of sys.path, but warp-lang was not a pip requirement. With no
pip-managed warp installed, the demoted kit warp 1.13.x was still the only
warp available and was still imported.

Root fix: add warp-lang>=1.0.0,<1.13 to pip requirements. When
./isaaclab.sh --install runs, pip uses kit's own Python
(kit/python/bin/python3) which installs into kit/python/lib/python3.12/
site-packages — the same directory kit shipped warp 1.13.x — overwriting
it with a compatible version.

Also replace path-based warnings in _pin_warp_import and AppLauncher with
version-based checks (>= 1.13). Path-based checks would fire false
positives after pip installs warp-lang into kit's site-packages (a
kit-managed path, but pip-managed content). Version-based checks correctly
identify the actual incompatibility regardless of install location.
@github-actions github-actions Bot added bug Something isn't working isaac-sim Related to Isaac Sim team isaac-lab Related to Isaac Lab team infrastructure labels Apr 22, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 22, 2026

Greptile Summary

This PR fixes three regressions introduced by recent Isaac Sim develop builds: (1) a CUDA context-poisoning bug in IsaacRtxRenderer caused by empty annotator buffers, (2) SimulationManager._on_stop now correctly firing and invalidating the shared PhysX tensors view, and (3) the omni.physics.tensors.impl.api module being removed in favour of omni.physics.tensors.api. It also adds path-ordering fixes to prevent bundled copies of torch/warp from shadowing pip-installed ones, and several install-time helpers to handle prebundle edge cases.

All three core fixes are well-scoped and correctly implemented. The import migration is comprehensive across all affected assets and sensors. The remaining inline findings are P2 style/cleanup items: stale CI comments left over from the now-reverted digest pin, and hardcoded pinocchio version strings that diverge from the project's dependency manifests.

Confidence Score: 5/5

Safe to merge — all three regressions are correctly addressed with no new logic errors introduced

All inline findings are P2 (stale comments, version string maintenance). The three core bug fixes are correct and well-tested. The import migration is exhaustive. Prior P2 concerns about wp.synchronize scope and the Carb subscription fallback remain valid suggestions but do not block merge.

.github/workflows/build.yaml has stale TODO comments that should be cleaned up before merging to avoid misleading future CI maintainers.

Important Files Changed

Filename Overview
source/isaaclab_physx/isaaclab_physx/renderers/isaac_rtx_renderer.py Adds empty-buffer guard before reshape_tiled_image kernel launch to prevent CUDA context poisoning on frames where the annotator hasn't populated data yet
source/isaaclab_physx/isaaclab_physx/init.py Rewrites _patch_isaacsim_simulation_manager to force-import SimulationManager, call enable_all_default_callbacks(False) to tear down Carb subscriptions before swapping the module attribute, and handles older builds gracefully
source/isaaclab/isaaclab/init.py Adds _deprioritize_prebundle_paths() called at import time to move pip_prebundle and conflicting extension paths to the end of sys.path
source/isaaclab/isaaclab/cli/commands/install.py Adds _torch_first_on_sys_path_is_prebundle, _maybe_uninstall_prebundled_torch, _ensure_pinocchio_installed helpers plus a nvidia namespace-package guard in _repoint_prebundle_packages to prevent stripping CUDA shared libraries
source/isaaclab/isaaclab/app/app_launcher.py Re-runs _deprioritize_prebundle_paths() after Kit extensions are loaded to demote any paths injected during extension startup
source/isaaclab_physx/isaaclab_physx/sensors/contact_sensor/contact_sensor.py Updates omni.physics.tensors.impl.api → omni.physics.tensors.api for the relocated PhysX Tensor API module
source/isaaclab/isaaclab/sensors/ray_caster/ray_caster.py Updates omni.physics.tensors.impl.api → omni.physics.tensors.api import path
.github/workflows/build.yaml Reverts pinned digest to latest-develop; stale comment and TODO above the variable still describe the old pinned state and should be removed
docs/source/migration/migrating_to_isaaclab_3-0.rst Adds migration note for the omni.physics.tensors module path change
source/isaaclab/test/cli/test_install_commands.py New test file with extensive coverage of _torch_first_on_sys_path_is_prebundle, _maybe_uninstall_prebundled_torch, and _repoint_prebundle_packages across multiple Python environment types

Sequence Diagram

sequenceDiagram
    participant Kit as Kit/Extension Startup
    participant Init as isaaclab.__init__
    participant PhysxInit as isaaclab_physx.__init__
    participant SM as Isaac Sim SimulationManager
    participant PM as PhysxManager
    participant RTX as IsaacRtxRenderer

    Init->>Init: _deprioritize_prebundle_paths() — demote pip_prebundle on sys.path
    PhysxInit->>SM: import isaacsim.core.simulation_manager (force)
    PhysxInit->>SM: SM.enable_all_default_callbacks(False) — tear down Carb subscriptions
    PhysxInit->>Kit: sys.modules['...'].SimulationManager = PhysxManager
    Note over PM: PhysxManager is now sole owner of physics lifecycle

    Kit->>Init: AppLauncher._load_extensions()
    Init->>Init: _deprioritize_prebundle_paths() (2nd pass — catches Kit-injected paths)

    loop per render frame
        RTX->>RTX: get tiled_data_buffer from annotator
        alt buffer.size == 0 (first 1-2 frames)
            RTX->>RTX: continue — skip kernel launch, output stays zero-initialised
        else buffer populated
            RTX->>RTX: wp.launch(reshape_tiled_image, ...)
            RTX->>RTX: wp.synchronize()
        end
    end
Loading

Reviews (2): Last reviewed commit: "Merge branch 'develop' into kellyg/fix-i..." | Re-trigger Greptile

@kellyguo11 kellyguo11 marked this pull request as draft April 22, 2026 19:15
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot

Summary

This PR fixes two critical regressions introduced by recent Isaac Sim develop builds: (1) an empty-annotator-buffer issue that causes CUDA context poisoning in the RTX renderer, and (2) Isaac Sim's SimulationManager invalidating the shared physics view on timeline stop events. The implementation is solid and well-documented, with comprehensive test coverage for the install-related changes.

Architecture Impact

Low cross-module impact. The two core fixes are self-contained:

  • The RTX renderer fix (isaac_rtx_renderer.py) only affects the render() method's loop over annotators
  • The SimulationManager patch (isaaclab_physx/__init__.py) touches module initialization but correctly tears down the original class's subscriptions before redirecting

The install-related changes (isaaclab/__init__.py, install.py) are path sanitization and probe functions that don't affect runtime behavior after startup.

Implementation Verdict

Ship it — Clean implementation with thorough root-cause analysis. The fixes directly address the documented failure modes, and the defensive fallback in the patch code handles API variations gracefully.

Test Coverage

Excellent for install commands; adequate for runtime fixes.

For the install.py changes:

  • test_install_commands.py (786 lines) provides comprehensive unit tests covering torch probe, prebundle uninstall, CUDA torch installation, and package repointing across all environment types (uv, pip, conda, kit Python, Windows).
  • test_install_prebundle.py adds targeted tests for the probe script content and _split_install_items parser.

For the runtime fixes:

  • The PR description mentions test_tiled_camera.py was used to verify the RTX renderer fix. The empty-buffer guard is a pure defensive check (skip kernel launch when no data), so existing camera tests implicitly cover the happy path.
  • The SimulationManager patch is harder to unit-test (requires full Isaac Sim runtime), but the fix is well-reasoned and the fallback is defensive.

CI Status

All CI failures are infrastructure-related (Docker buildkit errors: mkdir /var/lib/docker/containers/...: no such file or directory), not caused by this PR. Pre-commit ✅, Build Wheel ✅, Check for Broken Links ✅.

Findings

🔵 Improvement: source/isaaclab_physx/isaaclab_physx/renderers/isaac_rtx_renderer.py:321 — wp.synchronize() is called inside the per-data-type loop, which may cause unnecessary GPU pipeline stalls

The sync is now called after every data type's wp.launch(), but only the depth-clipping torch operations actually need it. For cameras with multiple data types (e.g., rgba + depth + semantic_segmentation), this adds unnecessary sync points.

Consider moving the sync outside the loop or only syncing before torch operations that actually modify the buffer:

# After the for loop, sync once before any torch post-processing
wp.synchronize()

# Then do all torch post-processing
for data_type in render_data.annotators:
    if data_type == "distance_to_camera":
        output_data[data_type][output_data[data_type] > cfg.spawn.clipping_range[1]] = torch.inf
    # ... etc

This is a minor performance consideration, not a correctness issue. The current implementation is safe.

🔵 Improvement: source/isaaclab_physx/isaaclab_physx/init.py:80-86 — The fallback attribute-clearing loop could be more explicit about what it's doing

The fallback when enable_all_default_callbacks(False) fails manually sets subscription handles to None. While this works, it relies on internal implementation details of the original SimulationManager class. Adding a log statement would help debugging:

import logging
logger = logging.getLogger(__name__)
# ...
except Exception as e:
    logger.debug(f"enable_all_default_callbacks unavailable ({e}), clearing subscription handles manually")
    for attr in (
        "_default_callback_warm_start",
        # ...
    ):
        if hasattr(original_class, attr):
            setattr(original_class, attr, None)

Comment thread source/isaaclab/isaaclab/sensors/camera/camera.py Outdated
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Apr 22, 2026
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot — Re-review (b0acf8b)

Summary

This push updates all imports of the PhysX tensors API from the removed omni.physics.tensors.impl.api path to the new omni.physics.tensors.api path, tracking an upstream Isaac Sim module relocation. The changes are straightforward and correct.

What Changed Since Last Review (f41b744b0acf8b)

Single commit: "Updates omni.physics.tensor.api import"

This commit updates 17 files across three packages to use the new import path:

Package Files Changed Change Type
isaaclab 4 files Runtime imports + docstring links
isaaclab_physx 11 files Runtime imports + docstring links
isaaclab_newton 1 file Docstring link only
docs 3 files Migration guide + example outputs

Changed imports:

# Before
import omni.physics.tensors.impl.api as physx

# After
import omni.physics.tensors.api as physx

Architecture Impact

No behavioral changes. This is a pure import path update to track an upstream Isaac Sim API relocation. The class identities (ArticulationView, RigidBodyView, SimulationView, etc.) are unchanged — only the module path moved.

The migration documentation added to migrating_to_isaaclab_3-0.rst is clear and helpful for downstream users who need to update their own code.

Implementation Verdict

Ship it — Clean, mechanical update with correct changes across all affected files. The CHANGELOGs are properly updated and migration docs are comprehensive.

Test Coverage

No new tests needed — this is a pure import path update with no behavioral changes. Existing tests (especially test_tiled_camera.py and other sensor tests) will verify the imports work correctly at runtime.

CI Status

  • ✅ pre-commit
  • ✅ Build Wheel
  • ✅ Check for Broken Links
  • ⏳ Docker + Tests (in progress)
  • ⏳ Build Base Docker Image (in progress)
  • ❌ Installation Tests — Infrastructure failure (Docker buildkit resolv.conf.tmp: no such file or directory), not caused by this PR

Findings

No issues with the new changes. The import path updates are correct and comprehensive.

Note from previous review: The two suggestions from my earlier review (batching wp.synchronize() calls and adding debug logging to the fallback path) remain valid but are minor improvements, not blockers. The core fixes are solid.

@kellyguo11 kellyguo11 marked this pull request as ready for review April 23, 2026 04:15
@kellyguo11 kellyguo11 requested a review from jtigue-bdai as a code owner April 23, 2026 04:15
@preeves-nv
Copy link
Copy Markdown
Collaborator

preeves-nv commented Apr 23, 2026

Hi @kellyguo11 @pbarejko

While looking at the CI failures on this PR I - with Claude, traced the root cause of some of the failures to _patch_isaacsim_simulation_manager() in isaaclab_physx/init.py. The call to enable_all_default_callbacks(False) disables all four Carb subscriptions including warm_start (PLAY). The warm_start callback is required to initialise the rendering pipeline on sim.reset() — without it, tiled-camera RGB output stays black and visualization-marker tests fail.

Does this make sense to you - I am not familiar enough with the whole code base to be sure - but I thought it worth trying to investigate?

Only _default_callback_on_stop needs to be unsubscribed — it's the one that calls invalidate_physics() and wrecks the shared omni.physics.tensors view. The other three callbacks, including warm_start, should be left intact.

Looking at the isaaclab (core) [1/3] CI log from this PR, the specific failures consistent with this are:

test_first_frame_textured_rendering (0/1)
test_tiled_camera (6/8)
test_multi_tiled_camera (6/11)
test_visualization_markers (5/6)
The fix is a single-file change to isaaclab_physx/init.py:

original_class = getattr(original_module, "SimulationManager", None)
if original_class is not None and original_class is not PhysxManager:
if hasattr(original_class, "_default_callback_on_stop"):
# Carb subscription objects unsubscribe on destruction — setting to
# None drops the reference and silently cancels the subscription.
original_class._default_callback_on_stop = None
else:
# Fallback for older Isaac Sim builds where the attribute doesn't exist.
# Note: this may cause tiled-camera black frames; the targeted fix above is preferred.
try:
original_class.enable_all_default_callbacks(False)
except Exception:
pass
The commit is cherry-pickable from preeves-nv:preeves/fix-on-stop-targeted (3799153) — also referenced in #5371. Hap

@kellyguo11 kellyguo11 requested a review from ooctipus as a code owner April 24, 2026 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working documentation Improvements or additions to documentation infrastructure isaac-lab Related to Isaac Lab team isaac-sim Related to Isaac Sim team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants