`synth_ai.data.rewards`

Reward data structures. This module defines pure data types for representing rewards in training and evaluation contexts. These are actual data records, not API abstractions. Synth AI uses two primary reward scopes:

Event Rewards: Fine-grained rewards attached to individual events within a session (e.g., each tool call, each LLM response). Use EventRewardRecord to annotate specific events with reward values.
Outcome Rewards: Episode-level rewards that summarize the overall success of a complete session. Use OutcomeRewardRecord for aggregate metrics.

Example usage:

from synth_ai.data.rewards import EventRewardRecord, OutcomeRewardRecord

# Annotate a specific event with a reward
event_reward = EventRewardRecord(
    event_id="evt_123",
    session_id="sess_abc",
    reward_value=0.8,
    reward_type="evaluator",
    annotation={"reason": "Correct tool selection"}
)

# Record episode-level outcome
outcome = OutcomeRewardRecord(
    session_id="sess_abc",
    total_reward=0.85,
    achievements_count=3,
    total_steps=10,
    metadata={"task": "code_generation"}
)

Classes

`OutcomeRewardRecord`

Episode-level reward summary. Aggregates reward information for a complete episode/session, including total reward, achievements, and step counts. This is the primary data structure for outcome rewards used in training. Attributes:

session_id: Unique identifier linking to the SessionTrace.
total_reward: Aggregate reward for the entire episode (typically 0.0-1.0).
objective_key: Objective identifier for this reward (defaults to “reward”).
achievements_count: Number of achievements/milestones reached.
total_steps: Total number of steps in the episode.
metadata: Task-specific metadata (e.g., {“task”: “code_gen”, “difficulty”: “hard”}).
annotation: Human or evaluator annotations explaining the score.
created_at: When this record was created.

`EventRewardRecord`

Event-level reward annotation. Links a reward to a specific event in a trace, with optional annotations and source information. Event rewards provide fine-grained feedback on individual actions or decisions within a session. Attributes:

event_id: Unique identifier of the event being rewarded.
session_id: Session containing this event.
reward_value: Reward for this specific event (typically 0.0-1.0).
objective_key: Objective identifier for this reward (defaults to “reward”).
reward_type: Category of reward (e.g., “tool_success”, “reasoning”, “progress”).
key: Rubric criterion or achievement key this reward relates to.
turn_number: Turn/step within the session where event occurred.
source: Origin of the reward (“environment”, “evaluator”, “human”).
annotation: Explanation or details about why this reward was given.
created_at: When this record was created.

`RewardAggregates`

Aggregated statistics for a set of rewards.

`CalibrationExample`

A calibration example for few-shot verifier evaluation. Contains a full execution trace with its ground truth rewards. Used to teach the verifier evaluation patterns through labeled examples.

`GoldExample`

A gold-standard example for contrastive verifier evaluation. Contains a correctly scored trace example that the verifier’s judgment will be compared against. Used to evaluate verifier consistency.

API Reference

​synth_ai.data.rewards

​Classes

​OutcomeRewardRecord

​EventRewardRecord

​RewardAggregates

​CalibrationExample

​GoldExample