Examples

  • Evals Demo
    • Compare models on the Crafter environment with parallel episodes and stacked progress bars
    • Post-run: filter traces to JSONL and view summary stats
    • Uses OpenAI-compatible API; bring your OPENAI_API_KEY
  • Rejection Finetuning
    • End-to-end: generate traces → filter to SFT JSONL → kick off SFT → run fine-tuned model
    • Qwen/Qwen3-4B Instruct with tool-calling in Crafter; fine-tunes via Synth API
    • Requires SYNTH_API_KEY and local tracing (uvx synth-ai serve) for dataset prep
  • On-Policy RL
    • Deploy a Task App, mint ENVIRONMENT_API_KEY, validate wiring, then kick off an RL job
    • Smoke test provider access from inside the Task App (OpenAI), then run full backend‑orchestrated RL