Examples

Evals Demo
- Compare models on the Crafter environment with parallel episodes and stacked progress bars
- Post-run: filter traces to JSONL and view summary stats
- Uses OpenAI-compatible API; bring your OPENAI_API_KEY
Rejection Finetuning
- End-to-end: generate traces → filter to SFT JSONL → kick off SFT → run fine-tuned model
- Qwen/Qwen3-4B Instruct with tool-calling in Crafter; fine-tunes via Synth API
- Requires SYNTH_API_KEY and local tracing (uvx synth-ai serve) for dataset prep
On-Policy RL
- Deploy a Task App, mint ENVIRONMENT_API_KEY, validate wiring, then kick off an RL job
- Smoke test provider access from inside the Task App (OpenAI), then run full backend‑orchestrated RL

Overview