Skip to main content

synth_ai.data.rewards

Reward data structures. This module defines pure data types for representing rewards in training and evaluation contexts. These are actual data records, not API abstractions.

Classes

RewardRecord

A single reward observation. Represents a reward signal at a specific point in a trajectory, with metadata about its source and scope.

OutcomeRewardRecord

Episode-level reward summary. Aggregates reward information for a complete episode/session, including total reward, achievements, and step counts.

EventRewardRecord

Event-level reward annotation. Links a reward to a specific event in a trace, with optional annotations and source information.

RewardAggregates

Aggregated statistics for a set of rewards.

CalibrationExample

A calibration example for few-shot verifier evaluation. Contains a full execution trace with its ground truth rewards. Used to teach the verifier evaluation patterns through labeled examples.

GoldExample

A gold-standard example for contrastive verifier evaluation. Contains a correctly scored trace example that the verifier’s judgment will be compared against. Used to evaluate verifier consistency.