RL Training Config Reference

Required Sections

[algorithm]
[services]
[policy]        # or legacy [model]
[rollout]
[training]
[evaluation]
[compute]       # optional but recommended
[judge]         # optional (rubric/judge settings)
[tags]          # optional metadata

Field Requirements (from `synth_ai/train/configs/rl.py`)

[algorithm]
- type must be "online"
- method must be one of "policy_gradient", "ppo", or "gspo"
- variety (string identifying the training variant)
[services]
- task_url (required; judge_url optional) – matches RLServicesConfig
[policy] (preferred) or legacy [model]
- trainer_mode and label are required
- Either source or base must be set, but not both (see ModelConfig validator)
[rollout]
- env_name
- policy_name
- max_turns
- episodes_per_batch
- max_concurrent_rollouts
[training]
- num_epochs
- iterations_per_epoch
- max_turns
- batch_size
- group_size
- learning_rate
- Optional: gradient_accumulation_steps, weight_sync, lora, rewards (per RLTrainingConfig)
[evaluation]
- instances
- every_n_iters
- seeds (list of ints)
[compute] (optional but strongly recommended)
- Standard fields from ComputeConfig (e.g., gpu_type, gpu_count, nodes, topology.reference_placement)
[judge] / [rubric]
- Optional fields for judge/rubric weighting; see JudgeConfig if you need blended rewards

Sample TOML

[algorithm]
type = "online"
method = "gspo"
variety = "default"

[services]
task_url = "https://my-task-app.modal.run"

[policy]
trainer_mode = "ppo"
label = "gpt-4o-mini"
source = "gpt-4o-mini"

[rollout]
env_name = "crafter"
policy_name = "policy-gspo"
max_turns = 32
episodes_per_batch = 4
max_concurrent_rollouts = 16

[training]
num_epochs = 50
iterations_per_epoch = 200
max_turns = 32
batch_size = 64
group_size = 4
learning_rate = 3e-4

[evaluation]
instances = 32
every_n_iters = 20
seeds = [1, 2, 3, 4]

[compute]
gpu_type = "A100"
gpu_count = 4

Get Started

Train Your Model

Training Configs

Prompt Learning

Supervised Fine Tuning

Reinforcement Learning

SDK Reference

RL Training Config Reference

Required Sections

Field Requirements (from `synth_ai/train/configs/rl.py`)

Sample TOML

Get Started

Train Your Model

Training Configs

Prompt Learning

Supervised Fine Tuning

Reinforcement Learning

SDK Reference

​Required Sections

​Field Requirements (from synth_ai/train/configs/rl.py)

​Sample TOML

Required Sections

Field Requirements (from `synth_ai/train/configs/rl.py`)

Sample TOML