Add SWE debug environment by rasdani · Pull Request #1306 · PrimeIntellect-ai/verifiers

rasdani · 2026-05-07T22:03:05Z

Summary

add SWEDebugEnv, a no-agent staged debugger for SWE-style SandboxTaskSet instances
support optional task setup, one debug_step (none, gold_patch, command, script), and optional test/scoring at exit
export SWEDebugEnv from the experimental composable modules

Validation

uv run pytest tests/test_swe_debug_env.py -q
uv run ruff check verifiers/envs/experimental/composable/swe_debug_env.py tests/test_swe_debug_env.py
push hooks: ruff check, ruff format, ty (ci parity)

Note

Medium Risk
Adds a new sandbox-orchestrating environment that can execute arbitrary debug commands/scripts and short-circuit test runs, which could affect resource usage and failure classification. Also changes Multi-SWE dataset construction by no longer excluding C/C++ rows, potentially impacting evaluation mix and runtime.

Overview
Adds SWEDebugEnv, a no-agent experimental environment that creates a SWE-style sandbox, optionally runs task setup, performs one configurable debug step (none, gold_patch, command, script), and optionally runs/scoring tests, recording timing and output tails plus standardized failure reasons.

Exports SWEDebugEnv via the experimental __init__ modules and adds focused pytest coverage for the pipeline and failure handling. Separately updates MultiSWETaskSet to stop filtering out C/C++ tasks during dataset build.

^{Reviewed by Cursor Bugbot for commit 4208539. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 5845454. Configure here.}

cursor · 2026-05-07T22:09:07Z

+            state["sandbox_id"],
+            command,
+            working_dir=self._workdir(state),
+            timeout=self.debug_timeout or self.test_timeout,


Falsy check on debug_timeout ignores explicit zero

Low Severity

The expression self.debug_timeout or self.test_timeout uses Python's or operator, which treats 0 as falsy. If a caller explicitly passes debug_timeout=0, it will be silently ignored and self.test_timeout (default 900) will be used instead. The correct pattern for an int | None optional is self.debug_timeout if self.debug_timeout is not None else self.test_timeout.

^{Reviewed by Cursor Bugbot for commit 5845454. Configure here.}

cursor · 2026-05-07T22:09:07Z

+    - entry: create sandbox and optionally run ``taskset.setup(state)``
+    - debug step: ``none``, ``gold_patch``, ``command``, or ``script``
+    - exit: optionally run task tests and score them
+    """


New environment class missing documentation updates

Low Severity

SWEDebugEnv is a new user-facing environment class exported from the experimental composable module, but no documentation files are updated. The docs/environments.md file describes the composable module's classes (ComposableEnv, TaskSet, SandboxTaskSet, Harness, SandboxSpec) under the experimental section, and docs/reference.md lists environment classes. The new SWEDebugEnv class is not mentioned in either. This violates the rule requiring documentation updates when adding core user-facing functionality described in docs/.

^{Triggered by project rule: BugBot Instructions}

^{Reviewed by Cursor Bugbot for commit 5845454. Configure here.}

Add SWE debug environment

5845454

rasdani mentioned this pull request May 7, 2026

Add SWE task debugger environment PrimeIntellect-ai/research-environments#353

Open

cursor Bot reviewed May 7, 2026

View reviewed changes

Use filtered Multi-SWE dataset directly

4208539

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SWE debug environment#1306

Add SWE debug environment#1306
rasdani wants to merge 2 commits intomainfrom
codex/swe-debug-env

rasdani commented May 7, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 7, 2026

Uh oh!

cursor Bot May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rasdani commented May 7, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 7, 2026

Choose a reason for hiding this comment

Falsy check on debug_timeout ignores explicit zero

Uh oh!

cursor Bot May 7, 2026

Choose a reason for hiding this comment

New environment class missing documentation updates

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rasdani commented May 7, 2026 •

edited by cursor Bot

Loading

Falsy check on `debug_timeout` ignores explicit zero