Skip to main content
The Crafter SFT tutorial walks through the exact workflow our team uses when bootstrapping supervised fine-tuning data from RL rollouts. Everything lives in the SDK under examples/warming_up_to_rl/.

1. Initialise the demo

uvx synth-ai demo --force
Choose Crafter GRPO (local FastAPI) when prompted. The CLI materialises the project (task app, configs, helper scripts) and a .env template.

2. Pair the CLI

uvx synth-ai setup
Run this inside the demo directory to populate .env with SYNTH_API_KEY and ENVIRONMENT_API_KEY.

3. Start the task app locally

uvx synth-ai deploy \
  --runtime local \
  --task-app task_app.py \
  --host 0.0.0.0 \
  --port 8001 \
  --env-file .env \
  --trace traces/v3 \
  --trace-db traces/v3/crafter_demo.db
Keep this process running; it captures rollouts and writes them to traces/v3.

4. Collect traced rollouts

In a second shell, run the bundled rollout script:
uv run python run_local_rollout_traced.py
Repeat until you see a non-zero outcome reward in the summary output. Each run appends trajectories to the trace database referenced above.

5. Export JSONL

uv run python export_trace_sft.py \
  --db traces/v3/crafter_demo.db \
  --out ft_data/crafter_traces.jsonl
The script walks the trace database, filters optional achievements, and produces an SFT-ready JSONL file.

6. Launch the SFT job

uvx synth-ai train \
  --config configs/crafter_fft.toml \
  --dataset ft_data/crafter_traces.jsonl \
  --env-file .env
The CLI validates the dataset, uploads it, submits the job, and streams status updates. When the run finishes you receive a checkpoint ID (ft:…) suitable for future RL or eval jobs.

7. Evaluate the checkpoint

uv run python run_eval.py \
  --toml configs/eval_groq_qwen32b.toml \
  --model ft:Qwen/Qwen3-4B:ftjob-XXXX \
  --use-rollout
Swap in the fine-tuned model ID from the previous step to compare performance against the baseline.

Tips

  • Keep ft_data/ under version control so you can track dataset revisions.
  • Use uvx synth-ai train --no-poll if you prefer to submit jobs and monitor them later with synth-ai status jobs ….
  • The same trace database can feed multiple filtered JSONL exports—experiment with different achievement filters before retraining.