Skip to content

Capture RLM agent patches in state#1260

Draft
rasdani wants to merge 4 commits intomainfrom
codex/rlm-swe-agent-patch
Draft

Capture RLM agent patches in state#1260
rasdani wants to merge 4 commits intomainfrom
codex/rlm-swe-agent-patch

Conversation

@rasdani
Copy link
Copy Markdown
Contributor

@rasdani rasdani commented Apr 27, 2026

Summary

  • replace the patch-specific Harness.agent_patch_state_key / ComposableEnv git helpers with a generic Harness.state_collectors lifecycle hook
  • add GitPatchCollector, which snapshots the post-setup worktree with git tree plumbing through a temporary index and stores the post-rollout diff in state["agent_patch"]
  • wire rlm_harness() to use GitPatchCollector() so research-environments2's rlm_swe composition gets patch capture without duplicating SWE taskset or RLM harness internals

Notes

  • Collection still runs before scoring, so SWE rubrics can mutate test files afterward without corrupting the captured agent patch.
  • The collector does not create commits, stage in the real index, reset the worktree, or depend on SWE-specific taskset code.

Tests

  • uv run pytest tests/test_composable_env.py tests/test_rlm_composable_env.py -q
  • uv run ruff check verifiers/envs/experimental/composable/composable_env.py verifiers/envs/experimental/composable/harness.py verifiers/envs/experimental/composable/harnesses/rlm.py verifiers/envs/experimental/composable/state_collectors.py tests/test_composable_env.py tests/test_rlm_composable_env.py
  • uv run ruff format --check verifiers/envs/experimental/composable/composable_env.py verifiers/envs/experimental/composable/harness.py verifiers/envs/experimental/composable/harnesses/rlm.py verifiers/envs/experimental/composable/state_collectors.py tests/test_composable_env.py tests/test_rlm_composable_env.py
  • pre-push hooks: ruff check, ruff format, Sync AGENTS.md from docs, ty (ci parity)

Note

Medium Risk
Adds new lifecycle hooks that execute extra sandbox commands during setup/rollout, which can affect timing and state mutation if collectors misbehave (though failures are logged and ignored). Git-based patch collection depends on repo state and tool availability inside the sandbox, so it may introduce intermittent missing/empty artifacts.

Overview
Adds harness-owned state collectors to ComposableEnv, running optional post_sandbox_setup and post_rollout hooks to persist rollout artifacts into state while swallowing collector failures.

Introduces GitPatchCollector, which snapshots a post-setup git tree using a temporary index and then stores a binary/full-index diff of agent edits into state["agent_patch"] after rollout. Enables this collector by default in rlm_harness, exports it from verifiers.envs.experimental.composable, and updates docs/tests to cover collector execution and patch correctness (including untracked/staged changes without modifying HEAD).

Reviewed by Cursor Bugbot for commit cb12ae9. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 1476d7d. Configure here.

Comment thread verifiers/envs/experimental/composable/harness.py Outdated
@rasdani rasdani marked this pull request as draft May 3, 2026 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant