Skip to main contentsynth_ai.data.rewards
Reward data structures.
This module defines pure data types for representing rewards in training
and evaluation contexts. These are actual data records, not API abstractions.
Classes
RewardRecord
A single reward observation.
Represents a reward signal at a specific point in a trajectory,
with metadata about its source and scope.
OutcomeRewardRecord
Episode-level reward summary.
Aggregates reward information for a complete episode/session,
including total reward, achievements, and step counts.
EventRewardRecord
Event-level reward annotation.
Links a reward to a specific event in a trace, with optional
annotations and source information.
RewardAggregates
Aggregated statistics for a set of rewards.
CalibrationExample
A calibration example for few-shot verifier evaluation.
Contains a full execution trace with its ground truth rewards.
Used to teach the verifier evaluation patterns through labeled examples.
GoldExample
A gold-standard example for contrastive verifier evaluation.
Contains a correctly scored trace example that the verifier’s judgment
will be compared against. Used to evaluate verifier consistency.