examples/rl/configs/— single-step math task configs.examples/warming_up_to_rl/configs/— Crafter and multi-step task configs (RL + SFT + eval).
Math (single-step)
rl_from_base_qwen.toml— trains from the base Qwen3-4B model.rl_from_base_qwen17.toml— variant targeting Qwen3-1.7B.rl_from_ft_qwen.toml— continues training from a fine-tuned checkpoint.eval_base_qwen.toml/eval_rl_qwen.toml— evaluate the base vs. RL-trained policies.
Crafter & Multi-step
rl_from_base_qwen4b.toml— on-policy RL starting from Qwen3-4B.rl_from_ft.toml— resume RL from an SFT checkpoint.crafter_fft.toml/crafter_fft_4b.toml— supervised fine-tuning configs.eval_groq_qwen32b.toml,eval_stepwise_*.toml— hosted evaluation templates.
Customising configs
- Task app URL — set 
[services].task_urlin RL configs or passTASK_APP_URLvia.env. - Model overrides — update 
[model].basefor RL or[job].modelfor SFT configs. - Provider credentials — many evaluation configs accept 
[policy.extra_headers]for API keys. - Tracing — ensure 
TASKAPP_TRACING_ENABLED=1when you want RL jobs to write rollouts for later SFT.