Examples
-
Evals Demo
- Compare models on the Crafter environment with parallel episodes and stacked progress bars
- Post-run: filter traces to JSONL and view summary stats
- Uses OpenAI-compatible API; bring your
OPENAI_API_KEY
-
Rejection Finetuning
- End-to-end: generate traces → filter to SFT JSONL → kick off SFT → run fine-tuned model
- Qwen/Qwen3-4B Instruct with tool-calling in Crafter; fine-tunes via Synth API
- Requires
SYNTH_API_KEY
and local tracing (uvx synth-ai serve
) for dataset prep
-
On-Policy RL
- Deploy a Task App, mint
ENVIRONMENT_API_KEY
, validate wiring, then kick off an RL job - Smoke test provider access from inside the Task App (OpenAI), then run full backend‑orchestrated RL
- Deploy a Task App, mint