Reference RL Configs

The SDK publishes ready-to-run RL configurations you can copy or adapt. They live under:

examples/rl/configs/ — single-step math task configs.
examples/warming_up_to_rl/configs/ — Crafter and multi-step task configs (RL + SFT + eval).

Math (single-step)

rl_from_base_qwen.toml — trains from the base Qwen3-4B model.
rl_from_base_qwen17.toml — variant targeting Qwen3-1.7B.
rl_from_ft_qwen.toml — continues training from a fine-tuned checkpoint.
eval_base_qwen.toml / eval_rl_qwen.toml — evaluate the base vs. RL-trained policies.

Use them with:

uvx synth-ai train --config examples/rl/configs/rl_from_base_qwen.toml --env-file .env
uv run python examples/rl/run_eval.py --toml examples/rl/configs/eval_base_qwen.toml

Crafter & Multi-step

rl_from_base_qwen4b.toml — on-policy RL starting from Qwen3-4B.
rl_from_ft.toml — resume RL from an SFT checkpoint.
crafter_fft.toml / crafter_fft_4b.toml — supervised fine-tuning configs.
eval_groq_qwen32b.toml, eval_stepwise_*.toml — hosted evaluation templates.

Run them with the helper scripts or the CLI:

uvx synth-ai train --config examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml --env-file .env
uv run python examples/warming_up_to_rl/run_eval.py --toml examples/warming_up_to_rl/configs/eval_groq_qwen32b.toml

Customising configs

Task app URL — set [services].task_url in RL configs or pass TASK_APP_URL via .env.
Model overrides — update [model].base for RL or [job].model for SFT configs.
Provider credentials — many evaluation configs accept [policy.extra_headers] for API keys.
Tracing — ensure TASKAPP_TRACING_ENABLED=1 when you want RL jobs to write rollouts for later SFT.

Use these files as starting points: copy them into your repo, adjust hyperparameters, and keep them under version control so teammates (and automation) can reproduce the exact runs.

Get Started

Fine-Tuning

Reinforcement Learning

CLI Commands

Reference RL Configs

Math (single-step)

Crafter & Multi-step

Customising configs

Get Started

Fine-Tuning

Reinforcement Learning

CLI Commands

​Math (single-step)

​Crafter & Multi-step

​Customising configs

Math (single-step)

Crafter & Multi-step

Customising configs