Math RL (single-step)
Minimal hosted RL loop against a math environment.- Task code:
examples/rl/task_app/math_single_step.py
- Start with a small base model (0.6B–1.7B) and 1x GPU
- Use short horizons and fast reward signals for quick iteration
Crafter RL (multi-step)
Multi-step RL with a richer state/action space.- Task app entry:
examples/warming_up_to_rl/task_app/grpo_crafter.py
- Environment and policies:
examples/warming_up_to_rl/task_app/synth_envs_hosted/envs/crafter/