Rewards

Two reward types for different granularities.

Using objectives for prompt optimization

GEPA/MIPRO can optimize prompts against objectives that combine reward, time spent, and cost used. The typical pattern is:

Record rewards during the rollout (event or outcome)
Record time/cost usage for the same run
Define objectives that trade off quality vs efficiency

Example objective mix:

Quality: maximize reward (outcome total or verifier score)
Latency: minimize wall time / steps
Cost: minimize token or USD usage

In practice, this means your task app returns a reward, while the platform tracks time and cost for the rollout. GEPA uses these signals together when ranking candidates (e.g., Pareto selection).

Event Rewards

Per-step rewards attached to individual events. Use for credit assignment.

tracer.record_event_reward(
    event_id=lm_event_id,
    reward_value=1.0,
    reward_type="achievement",
    key="collected_wood",
    source="environment",
)

Schema

Field	Type	Description
`event_id`	int	FK to event
`reward_value`	float	Reward (positive or negative)
`reward_type`	str	`achievement`, `achievement_delta`, `unique_achievement_delta`, `shaped`, `sparse`, `penalty`, `evaluator`, `human`
`key`	str	Achievement name or reward identifier
`source`	str	`environment`, `runner`, `evaluator`, `human`
`annotation`	dict	Additional context

Reward Types

achievement: Binary, one per achievement unlocked
achievement_delta: Count of achievements unlocked this step
unique_achievement_delta: Count of new achievements this episode
shaped: Dense signal for incremental progress
sparse: Reward only at milestones
evaluator: From automated verifier
human: Human annotation

Outcome Rewards

Episode-level summary. Use for filtering and ranking.

tracer.record_outcome_reward(
    total_reward=7,
    achievements_count=7,
    total_steps=42,
    reward_metadata={"achievements_list": [...]},
)

Schema

Field	Type	Description
`session_id`	str	FK to session
`total_reward`	int	Episode score (e.g., unique achievements)
`achievements_count`	int	Milestones reached
`total_steps`	int	Episode length
`reward_metadata`	dict	Achievements list, final state, etc.

Filtering

# High-quality episodes
cursor.execute("""
    SELECT session_id FROM outcome_rewards
    WHERE total_reward >= 5 AND total_steps >= 10
    ORDER BY total_reward DESC
""")

# High-reward steps within those episodes
cursor.execute("""
    SELECT event_id FROM event_rewards
    WHERE session_id IN (?) AND reward_value > 0
""", good_sessions)

Getting started

Algorithms

LocalAPI

Tunnel/Deploy

Datasets & Verifiers

Using objectives for prompt optimization

Event Rewards

Schema

Reward Types

Outcome Rewards

Schema

Filtering

Getting started

Algorithms

LocalAPI

Tunnel/Deploy

Datasets & Verifiers

​Using objectives for prompt optimization

​Event Rewards

​Schema

​Reward Types

​Outcome Rewards

​Schema

​Filtering

Using objectives for prompt optimization

Event Rewards

Schema

Reward Types

Outcome Rewards

Schema

Filtering