Skip to content

Add SWE debug environment#1306

Open
rasdani wants to merge 2 commits intomainfrom
codex/swe-debug-env
Open

Add SWE debug environment#1306
rasdani wants to merge 2 commits intomainfrom
codex/swe-debug-env

Conversation

@rasdani
Copy link
Copy Markdown
Contributor

@rasdani rasdani commented May 7, 2026

Summary

  • add SWEDebugEnv, a no-agent staged debugger for SWE-style SandboxTaskSet instances
  • support optional task setup, one debug_step (none, gold_patch, command, script), and optional test/scoring at exit
  • export SWEDebugEnv from the experimental composable modules

Validation

  • uv run pytest tests/test_swe_debug_env.py -q
  • uv run ruff check verifiers/envs/experimental/composable/swe_debug_env.py tests/test_swe_debug_env.py
  • push hooks: ruff check, ruff format, ty (ci parity)

Note

Medium Risk
Adds a new sandbox-orchestrating environment that can execute arbitrary debug commands/scripts and short-circuit test runs, which could affect resource usage and failure classification. Also changes Multi-SWE dataset construction by no longer excluding C/C++ rows, potentially impacting evaluation mix and runtime.

Overview
Adds SWEDebugEnv, a no-agent experimental environment that creates a SWE-style sandbox, optionally runs task setup, performs one configurable debug step (none, gold_patch, command, script), and optionally runs/scoring tests, recording timing and output tails plus standardized failure reasons.

Exports SWEDebugEnv via the experimental __init__ modules and adds focused pytest coverage for the pipeline and failure handling. Separately updates MultiSWETaskSet to stop filtering out C/C++ tasks during dataset build.

Reviewed by Cursor Bugbot for commit 4208539. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 5845454. Configure here.

state["sandbox_id"],
command,
working_dir=self._workdir(state),
timeout=self.debug_timeout or self.test_timeout,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Falsy check on debug_timeout ignores explicit zero

Low Severity

The expression self.debug_timeout or self.test_timeout uses Python's or operator, which treats 0 as falsy. If a caller explicitly passes debug_timeout=0, it will be silently ignored and self.test_timeout (default 900) will be used instead. The correct pattern for an int | None optional is self.debug_timeout if self.debug_timeout is not None else self.test_timeout.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 5845454. Configure here.

- entry: create sandbox and optionally run ``taskset.setup(state)``
- debug step: ``none``, ``gold_patch``, ``command``, or ``script``
- exit: optionally run task tests and score them
"""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New environment class missing documentation updates

Low Severity

SWEDebugEnv is a new user-facing environment class exported from the experimental composable module, but no documentation files are updated. The docs/environments.md file describes the composable module's classes (ComposableEnv, TaskSet, SandboxTaskSet, Harness, SandboxSpec) under the experimental section, and docs/reference.md lists environment classes. The new SWEDebugEnv class is not mentioned in either. This violates the rule requiring documentation updates when adding core user-facing functionality described in docs/.

Fix in Cursor Fix in Web

Triggered by project rule: BugBot Instructions

Reviewed by Cursor Bugbot for commit 5845454. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant