Skip to content

Add per-step reward field to Action and Observation schemas#183

Open
henryjcee wants to merge 1 commit intoneulab:mainfrom
henryjcee:henry/add-reward-to-the-schema
Open

Add per-step reward field to Action and Observation schemas#183
henryjcee wants to merge 1 commit intoneulab:mainfrom
henryjcee:henry/add-reward-to-the-schema

Conversation

@henryjcee
Copy link
Copy Markdown

@henryjcee henryjcee commented Apr 9, 2026

Summary

  • Adds an optional reward: float | None field to the base Action class (schema/action/action.py), inherited by ApiAction, CodeAction, and MessageAction
  • Adds an optional reward: float | None field to the base Observation class (schema/observation/observation.py), inherited by TextObservation, WebObservation, and ImageObservation
  • Updates schema/SCHEMA.md to document the new field on both base classes

Motivation

ADP currently has no mechanism to attach reward signals to individual trajectory steps. This makes it difficult to use ADP-formatted data for reinforcement learning, where per-step rewards are a core primitive.

This change adds reward as a first-class optional field on every action and observation, allowing datasets to record the reward received at each step of a trajectory. It's conceivable that some RL settings may provide reward with an observation or at action time, this change supports both approaches.

Design notes

  • reward defaults to None — fully backwards-compatible, all existing sample_std.json files validate without modification
  • Modelled as a plain float scalar (not a distribution or vector) to keep the schema simple and composable

Tests

  • pytest tests/test_standardized_schemas.py — all 33 datasets pass
  • Full test suite — 136 passed, 0 failures

I don't think new tests are required by this but happy to add if useful.

…optional `reward: float | None` field to the base `Action` and `Observation` classes,enabling RL training data to carry per-step reward signals. All six concrete action/observation types inherit the field. Existing datasets are unaffected as the field defaults to None.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant