Skip to main content
The SDK publishes ready-to-run RL configurations you can copy or adapt. They live under:
  • examples/rl/configs/ — single-step math task configs.
  • examples/warming_up_to_rl/configs/ — Crafter and multi-step task configs (RL + SFT + eval).

Math (single-step)

  • rl_from_base_qwen.toml — trains from the base Qwen3-4B model.
  • rl_from_base_qwen17.toml — variant targeting Qwen3-1.7B.
  • rl_from_ft_qwen.toml — continues training from a fine-tuned checkpoint.
  • eval_base_qwen.toml / eval_rl_qwen.toml — evaluate the base vs. RL-trained policies.
Use them with:
uvx synth-ai train --config examples/rl/configs/rl_from_base_qwen.toml --env-file .env
uv run python examples/rl/run_eval.py --toml examples/rl/configs/eval_base_qwen.toml

Crafter & Multi-step

  • rl_from_base_qwen4b.toml — on-policy RL starting from Qwen3-4B.
  • rl_from_ft.toml — resume RL from an SFT checkpoint.
  • crafter_fft.toml / crafter_fft_4b.toml — supervised fine-tuning configs.
  • eval_groq_qwen32b.toml, eval_stepwise_*.toml — hosted evaluation templates.
Run them with the helper scripts or the CLI:
uvx synth-ai train --config examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml --env-file .env
uv run python examples/warming_up_to_rl/run_eval.py --toml examples/warming_up_to_rl/configs/eval_groq_qwen32b.toml

Customising configs

  • Task app URL — set [services].task_url in RL configs or pass TASK_APP_URL via .env.
  • Model overrides — update [model].base for RL or [job].model for SFT configs.
  • Provider credentials — many evaluation configs accept [policy.extra_headers] for API keys.
  • Tracing — ensure TASKAPP_TRACING_ENABLED=1 when you want RL jobs to write rollouts for later SFT.
Use these files as starting points: copy them into your repo, adjust hyperparameters, and keep them under version control so teammates (and automation) can reproduce the exact runs.