Required Sections
Field Requirements (from synth_ai/train/configs/rl.py)
[algorithm]typemust be"online"methodmust be one of"policy_gradient","ppo", or"gspo"variety(string identifying the training variant)
[services]task_url(required;judge_urloptional) – matchesRLServicesConfig
[policy](preferred) or legacy[model]trainer_modeandlabelare required- Either
sourceorbasemust be set, but not both (seeModelConfigvalidator)
[rollout]env_namepolicy_namemax_turnsepisodes_per_batchmax_concurrent_rollouts
[training]num_epochsiterations_per_epochmax_turnsbatch_sizegroup_sizelearning_rate- Optional:
gradient_accumulation_steps,weight_sync,lora,rewards(perRLTrainingConfig)
[evaluation]instancesevery_n_itersseeds(list of ints)
[compute](optional but strongly recommended)- Standard fields from
ComputeConfig(e.g.,gpu_type,gpu_count,nodes,topology.reference_placement)
- Standard fields from
[judge]/[rubric]- Optional fields for judge/rubric weighting; see
JudgeConfigif you need blended rewards
- Optional fields for judge/rubric weighting; see