Reinforcement Learning Examples - Synth AI

Math RL (single-step)
Crafter RL (multi-step)

Two representative RL examples: a simple Math single-step task and a multi-step Crafter environment.

Math RL (single-step)

Minimal hosted RL loop against a math environment.

uvx synth-ai train \
  --type rl \
  --config examples/rl/configs/math_small.toml \
  --task-url https://<your-task-app>

Task code: examples/rl/task_app/math_single_step.py
Start with a small base model (0.6B–1.7B) and 1x GPU
Use short horizons and fast reward signals for quick iteration

Crafter RL (multi-step)

Multi-step RL with a richer state/action space.

uvx synth-ai train \
  --type rl \
  --config examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml \
  --task-url https://<your-task-app>

Task app entry: examples/warming_up_to_rl/task_app/grpo_crafter.py
Environment and policies: examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/

On-Policy RL Demo Evals Demo

⌘I