Skip to main content
Two representative RL examples: a simple Math single-step task and a multi-step Crafter environment.

Math RL (single-step)

Minimal hosted RL loop against a math environment.
uvx synth-ai train \
  --type rl \
  --config examples/rl/configs/math_small.toml \
  --task-url https://<your-task-app>
  • Task code: examples/rl/task_app/math_single_step.py
  • Start with a small base model (0.6B–1.7B) and 1x GPU
  • Use short horizons and fast reward signals for quick iteration

Crafter RL (multi-step)

Multi-step RL with a richer state/action space.
uvx synth-ai train \
  --type rl \
  --config examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml \
  --task-url https://<your-task-app>
  • Task app entry: examples/warming_up_to_rl/task_app/grpo_crafter.py
  • Environment and policies: examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/
I