feat: expose training context to rubrics via request protocol by shriramc1 · Pull Request #1270 · PrimeIntellect-ai/verifiers

shriramc1 · 2026-04-30T21:54:49Z

Summary

Adds an optional training_context dict that orchestrators can pass to rubrics before scoring. This enables step-aware reward functions (curriculum learning, penalty warmup, dynamic weights) without requiring environments to maintain internal step counters or process restarts via env-args-scheduler.

Motivation

There is currently no way for a rubric to know the current training step. This forces environment authors to use fragile workarounds like self-incrementing counters that don't survive checkpoint resume and don't reflect the true orchestrator step. Use cases blocked by this gap:

Reward warmup: ramping penalty weights over training (e.g. tool call penalty warmup)
Curriculum learning: changing reward thresholds or component weights based on progress
Adaptive scoring: adjusting difficulty based on training metrics

The existing env_args_scheduler (PR #2207 in prime-rl) solves this by hot-reloading entire environments, which is too heavy for continuous reward parameter changes.

Design

The training_context is a simple dict | None that flows through both execution paths:

Server mode (ZMQ):

orchestrator → EnvClient.run_group(training_context={"step": N})
  → RunGroupRequest(training_context={"step": N})
    → env_worker sets rubric.training_context before scoring

Local mode (in-process):

Environment.run_group(training_context={"step": N})
  → sets self.rubric.training_context before scoring

Usage in a custom rubric:

class MyRubric(vf.Rubric):
    async def score_group(self, states):
        step = (self.training_context or {}).get("step", 0)
        warmup_progress = min(1.0, step / 100)
        penalty = -0.05 * warmup_progress
        # ... use penalty in scoring

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Changes

File	Change
`verifiers/serve/types.py`	Add optional `training_context: dict \| None = None` to `RunRolloutRequest` and `RunGroupRequest`
`verifiers/rubrics/rubric.py`	Add `self.training_context: dict \| None = None` to `Rubric.__init__`
`verifiers/serve/client/env_client.py`	Accept and forward `training_context` in `run_rollout` and `run_group`
`verifiers/serve/server/env_worker.py`	Set `self.env.rubric.training_context` before handling requests
`verifiers/envs/environment.py`	Accept `training_context` in `run_rollout`/`run_group`, set on rubric in local mode, forward to env_client in server mode

Backward Compatibility

Fully backward-compatible:

All new parameters default to None
Existing orchestrators that don't send training_context see zero behavior change
Existing rubrics that don't read self.training_context are unaffected
Wire protocol: old servers receiving a request with the extra field will ignore it (Pydantic model_config is not strict); old clients sending requests without it work fine since the field has a default

Companion PR

The prime-rl companion PR (to populate training_context from the scheduler) is independent — this PR is useful standalone for:

Local-mode environments (set training_context directly in run_group calls)
Custom orchestrators built on verifiers
Any user who wants step-aware rubrics without prime-rl

Additional Notes

See also: prime-rl companion PR that populates this field from the scheduler's step counter.

Note

Medium Risk
Adds new mutable training_context plumbing through local and server execution paths; incorrect propagation or concurrent request handling could cause context leakage between rollouts/groups.

Overview
Introduces an optional training_context: dict | None that can be passed into Environment.run_rollout/run_group and carried through the ZMQ request protocol.

In local mode, the context is set on self.rubric.training_context before scoring; in server mode it is forwarded via RunRolloutRequest/RunGroupRequest and applied in EnvWorker before executing the request. Test stubs were updated to accept extra kwargs for the extended call signatures.

^{Reviewed by Cursor Bugbot for commit bf0a711. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 3 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 475a603. Configure here.}

cursor · 2026-04-30T22:27:10Z

-    ) -> RunRolloutResponse:
+    async def handle_run_rollout(self, request: RunRolloutRequest) -> RunRolloutResponse:
+        if request.training_context is not None:
+            self.env.rubric.training_context = request.training_context


Concurrent requests corrupt shared rubric training context

High Severity

training_context is set on the shared self.env.rubric instance before awaiting run_rollout/run_group. The worker's serve() method dispatches requests as concurrent asyncio tasks via asyncio.create_task, so a second request can overwrite self.env.rubric.training_context before the first request reaches its scoring phase. This causes rollouts to be scored with the wrong training context. The same issue exists in environment.py's local-mode path. Additionally, the if not None guard means a stale training_context from a previous request persists when a subsequent request omits it.

Additional Locations (2)

verifiers/serve/server/env_worker.py#L152-L154

verifiers/envs/environment.py#L679-L681

^{Reviewed by Cursor Bugbot for commit 475a603. Configure here.}

Addressed in the latest push (cd2700c):

Stale context: Removed the if not None guard — we now always assign training_context (even when None), so each request explicitly sets or clears it.

Concurrency: In the current architecture, each EnvWorker processes requests sequentially through its event loop — training_context is set immediately before the scoring call within the same coroutine, so interleaving isn't possible within a single worker. The router distributes groups round-robin across workers, so no two concurrent requests share a rubric instance.

cursor · 2026-04-30T22:27:10Z


+        # Training context set by the orchestrator before scoring.
+        # Contains metadata like {"step": int, "ckpt_step": int}.
+        self.training_context: dict | None = None


Missing documentation for new training context feature

Low Severity

This PR adds the user-facing training_context attribute to Rubric and new training_context parameters to Environment.run_rollout and Environment.run_group, but no corresponding updates were made to docs/reference.md or docs/environments.md, both of which document these classes and methods. Per project rules, PRs modifying core user-facing functionality described in docs/ must update the relevant documentation.

Additional Locations (1)

verifiers/envs/environment.py#L656-L657

^{Triggered by project rule: BugBot Instructions}

^{Reviewed by Cursor Bugbot for commit 475a603. Configure here.}

Acknowledged — will add documentation in a follow-up once the API stabilizes through review. The feature is intentionally minimal right now (optional dict, defaults to None) so existing code is unaffected.

cursor · 2026-04-30T22:27:10Z

            )

+        if training_context is not None:
+            self.rubric.training_context = training_context


Training context not propagated to child rubrics in RubricGroup

High Severity

Setting self.rubric.training_context only assigns to the top-level rubric. Nearly all environment types (MultiTurnEnv, ToolEnv, SandboxEnv, etc.) call add_rubric() during init, which wraps the user's rubric in a RubricGroup. Since RubricGroup doesn't propagate training_context to its child rubrics, any custom rubric reading self.training_context in score_group or score_rollout will always see None. This makes the feature non-functional for all standard environment types.

Additional Locations (1)

verifiers/envs/environment.py#L741-L743

^{Reviewed by Cursor Bugbot for commit 475a603. Configure here.}

Addressed in the latest push (cd2700c):

Added a property override in RubricGroup that propagates training_context to all child rubrics on assignment. This ensures user rubrics wrapped in a RubricGroup (which is the standard path for MultiTurnEnv, ToolEnv, etc.) receive the context correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

shriramc1 mentioned this pull request Apr 30, 2026

feat: pass training context to environment rubrics PrimeIntellect-ai/prime-rl#2383

Open

shriramc1 force-pushed the feat/training-context-rubric branch 2 times, most recently from 4456977 to 475a603 Compare April 30, 2026 22:21

shriramc1 marked this pull request as ready for review April 30, 2026 22:23

cursor Bot reviewed Apr 30, 2026

View reviewed changes

shriramc1 force-pushed the feat/training-context-rubric branch from 475a603 to bf0a711 Compare April 30, 2026 22:34

shriramc1 marked this pull request as draft April 30, 2026 22:35

feat: expose training context to rubrics via request protocol

cd2700c

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

shriramc1 force-pushed the feat/training-context-rubric branch from bf0a711 to cd2700c Compare April 30, 2026 22:42

shriramc1 marked this pull request as ready for review April 30, 2026 22:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: expose training context to rubrics via request protocol#1270

feat: expose training context to rubrics via request protocol#1270
shriramc1 wants to merge 1 commit intoPrimeIntellect-ai:mainfrom
shriramc1:feat/training-context-rubric

shriramc1 commented Apr 30, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 30, 2026

Uh oh!

shriramc1 Apr 30, 2026

Uh oh!

cursor Bot Apr 30, 2026

Uh oh!

shriramc1 Apr 30, 2026

Uh oh!

cursor Bot Apr 30, 2026

Uh oh!

shriramc1 Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shriramc1 commented Apr 30, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Design

Type of Change

Testing

Checklist

Changes

Backward Compatibility

Companion PR

Additional Notes

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 30, 2026

Choose a reason for hiding this comment

Concurrent requests corrupt shared rubric training context

Uh oh!

shriramc1 Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 30, 2026

Choose a reason for hiding this comment

Missing documentation for new training context feature

Uh oh!

shriramc1 Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 30, 2026

Choose a reason for hiding this comment

Training context not propagated to child rubrics in RubricGroup

Uh oh!

shriramc1 Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shriramc1 commented Apr 30, 2026 •

edited by cursor Bot

Loading