Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 12 additions & 6 deletions newton/tests/thirdparty/unittest_parallel.py
Original file line number Diff line number Diff line change
Expand Up @@ -231,12 +231,18 @@ def main(argv=None):
results = pool.map(test_manager.run_tests, test_suites)
else:
# NVIDIA Modification added concurrent.futures
with concurrent.futures.ProcessPoolExecutor(
max_workers=process_count,
mp_context=multiprocessing.get_context(method="spawn"),
initializer=initialize_test_process,
initargs=(manager.Lock(), shared_index, args, temp_dir),
) as executor:
executor_kwargs = {
"max_workers": process_count,
"mp_context": multiprocessing.get_context(method="spawn"),
"initializer": initialize_test_process,
"initargs": (manager.Lock(), shared_index, args, temp_dir),
}
# Recycle workers after each test suite to prevent host
# memory accumulation from Warp kernel compilation and
# other allocations that persist across suites (see #1847).
if sys.version_info >= (3, 11):
executor_kwargs["max_tasks_per_child"] = 1
with concurrent.futures.ProcessPoolExecutor(**executor_kwargs) as executor:
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

--disable-process-pooling no longer controls the default executor path.

On Python 3.11+, this now recycles ProcessPoolExecutor workers unconditionally, while the multiprocessing.Pool branch still keys recycling off args.disable_process_pooling. That makes backend behavior inconsistent and removes the existing opt-in/opt-out control from the default path. If the new default is intentional, please update the CLI contract/help/docs accordingly; otherwise, thread args.disable_process_pooling through here as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@newton/tests/thirdparty/unittest_parallel.py` around lines 243 - 245, The
ProcessPoolExecutor branch unconditionally sets worker recycling on Python 3.11+
causing inconsistent behavior with the multiprocessing.Pool branch; update the
logic around executor_kwargs and ProcessPoolExecutor so it respects the same CLI
flag by checking args.disable_process_pooling before adding max_tasks_per_child
(i.e., only set executor_kwargs["max_tasks_per_child"]=1 when sys.version_info
>= (3,11) AND args.disable_process_pooling is false), ensuring
ProcessPoolExecutor, executor_kwargs and args.disable_process_pooling are used
consistently.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional — the two paths serve different roles:

  • The ProcessPoolExecutor path is the default that all users hit. This is where Test error on Horde Win11 L40: concurrent.futures.process.BrokenProcessPool #1847 keeps recurring, so it needs to be safe out of the box. Making recycling unconditional here is the fix.
  • The multiprocessing.Pool path is a fallback behind --disable-concurrent-futures, used for debugging. --disable-process-pooling was added as an opt-in for that fallback path specifically.

Gating the fix behind --disable-process-pooling would keep recycling opt-in on the default path, which is exactly the status quo that keeps crashing. Nobody passes that flag today, and requiring it would not fix the issue.

The performance cost of always recycling is negligible — Warp kernels are cached on disk, so worker re-initialization loads from cache rather than recompiling.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vreutskyy, thanks for the clear explanation — that makes the design contract explicit and the reasoning is sound. The two flags operate on completely separate code paths with different roles, so there's no inconsistency. My concern is withdrawn.


✏️ Learnings added
Learnt from: vreutskyy
Repo: newton-physics/newton PR: 2404
File: newton/tests/thirdparty/unittest_parallel.py:243-245
Timestamp: 2026-04-10T08:53:15.343Z
Learning: In newton-physics/newton unittest_parallel.py, `max_tasks_per_child=1` on the `ProcessPoolExecutor` path is intentionally unconditional (guarded only by Python >= 3.11). The `ProcessPoolExecutor` branch is the default path that all users hit and must recycle workers out of the box to prevent BrokenProcessPool (issue `#1847`). The `--disable-process-pooling` flag is scoped exclusively to the `multiprocessing.Pool` fallback path (behind `--disable-concurrent-futures`), which is a debug-only path. Do not flag the absence of `--disable-process-pooling` gating on the `ProcessPoolExecutor` path as inconsistency.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: jumyungc
Repo: newton-physics/newton PR: 1333
File: newton/_src/sim/builder.py:0-0
Timestamp: 2026-01-13T03:11:40.556Z
Learning: Ensure all VBD damping semantics follow Rayleigh-style damping and are unitless. This convention, demonstrated in SolverVBD, should be applied consistently across all VBD-related constraints in the codebase. In newton/_src/sim/builder.py, validate that any damping terms use the same normalization and are treated as dimensionless; add comments and unit tests to enforce consistency across VBD components.

Learnt from: adenzler-nvidia
Repo: newton-physics/newton PR: 1645
File: newton/tests/test_mujoco_solver.py:0-0
Timestamp: 2026-02-16T15:42:56.636Z
Learning: In tests for the Newton solver, prefer importing and using SolverMuJoCo._mujoco_warp instead of importing mujoco_warp directly. This avoids hard dependencies on the external MJWarp package and enables skipping/xfailing tests gracefully when MJWarp isn't installed. Update test imports to import SolverMuJoCo and reference SolverMuJoCo._mujoco_warp, and guard optional behavior with pytest.importorskip or try/except ImportError as appropriate.

Learnt from: adenzler-nvidia
Repo: newton-physics/newton PR: 1773
File: newton/tests/test_mujoco_solver.py:7550-7556
Timestamp: 2026-02-26T12:51:12.942Z
Learning: Do not add defensive try/except guards around SolverMuJoCo construction in tests that rely on mujoco_warp features (e.g., set_length_range, set_const_0). The repo pins mujoco-warp >=3.5.0.1 (PR `#1773`), so tests may assume these APIs exist. If a test would fail due to a missing API, prefer failing fast or asserting the API/version instead of swallowing the error.

Learnt from: jvonmuralt
Repo: newton-physics/newton PR: 1925
File: newton/examples/contacts/example_brick_stacking.py:385-385
Timestamp: 2026-03-05T12:15:57.546Z
Learning: In Warp kernel code within the newton-physics/newton repository, composite type slicing (e.g., wp.vec4[:3], wp.mat33[:-1, 1:]) is supported since v1.9.0, and array-level negative indexing/slicing inside kernels was extended in v1.10.0. Expressions like ee_rot_target[tid][:3] (where ee_rot_target is wp.array(dtype=wp.vec4)) are valid and should not be flagged as unsupported slice syntax. Apply this understanding broadly to Python files in Warp contexts during code reviews.

Learnt from: eric-heiden
Repo: newton-physics/newton PR: 1988
File: newton/tests/thirdparty/unittest_parallel.py:245-261
Timestamp: 2026-03-07T20:07:41.440Z
Learning: In newton-physics/newton, this learning explains that the menagerie pre-download test list covers only non-skipped menagerie tests and test_robot_composer. Do not mark it incomplete based solely on raw robot_folder assignments. Be aware that in test_menagerie_mujoco.py placeholder MJCF classes set skip_reason and TestMenagerieBase.setUpClass() may skip before download_menagerie_asset() runs, and in test_menagerie_usd_mujoco.py placeholder USD stubs skip before super().setUpClass() when usd_asset_folder is unset, so downloads aren’t triggered. When reviewing this area, validate expectations against the actual skip timing and ensure that absence of a download early in test setup is intentional and not a sign of missing test coverage.

Learnt from: preist-nvidia
Repo: newton-physics/newton PR: 1968
File: newton/_src/solvers/xpbd/solver_xpbd.py:249-250
Timestamp: 2026-03-09T14:12:43.599Z
Learning: In the repository newton-physics/newton, PRs labeled '1.0-release' are considered part of the pre-stable API cleanup and are exempt from the deprecation-first policy and the [Unreleased] CHANGELOG entry requirement. Breaking changes included in the 1.0 release do not require a prior deprecation window. This applies to Python files across the codebase.

Learnt from: vreutskyy
Repo: newton-physics/newton PR: 2074
File: newton/tests/test_import_usd.py:0-0
Timestamp: 2026-03-12T22:28:25.778Z
Learning: In test_import_usd.py, if testing USD import cycles, ensure that the TestResolveUsdFromUrl.test_cyclic_references validates that each URL (scene.usd, a.usd, b.usd) is fetched exactly once to guard against root re-download regressions. Use mocks/spies on the fetch mechanism and assert call counts per URL, and verify no extra fetches occur during the cycle. This pattern should apply to other tests that exercise URL-based resource graphs and potential re-downloads.

Learnt from: shi-eric
Repo: newton-physics/newton PR: 2242
File: newton/_src/utils/import_usd.py:184-184
Timestamp: 2026-03-27T16:48:54.491Z
Learning: Because `parse_usd` in `newton/_src/utils/import_usd.py` is a private helper and is not re-exported as part of the public API (the public entry point is `ModelBuilder.add_usd`), avoid Sphinx cross-references (`:func:`) to `parse_usd` in docstrings. If you must mention it, use inline code literals (e.g., ``parse_usd()``) rather than `:func:` references, since there is no valid public Sphinx target for this private function.

Learnt from: eric-heiden
Repo: newton-physics/newton PR: 2261
File: newton/_src/sim/builder.py:0-0
Timestamp: 2026-04-01T14:40:44.664Z
Learning: When implementing a deprecation shim in the Python codebase, avoid using a sentinel/default strategy that can’t reliably distinguish “omitted” from “explicit default” if the only downside is a single corner-case ambiguity. Prefer keeping clean public signatures (no sentinel) unless doing so would violate a documented guarantee. If you keep the clean signature, ensure the documentation/release notes (docs/ and/or CHANGELOG) explicitly forbid mixing the old and new names/parameters in the same call to prevent precedence ambiguity (e.g., which name wins).

Copy link
Copy Markdown
Member

@preist-nvidia preist-nvidia Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two follow up questions:

  1. Can you quantify perf impact on CICD UT runs of this change?
  2. Why is the change not limited to Windows?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

test_manager = ParallelTestManager(manager, args, temp_dir)
results = list(executor.map(test_manager.run_tests, test_suites, timeout=2400))
else:
Expand Down
Loading