Crafter SFT Demo

The Crafter SFT tutorial walks through the exact workflow our team uses when bootstrapping supervised fine-tuning data from RL rollouts. Everything lives in the SDK under examples/warming_up_to_rl/.

1. Initialise the demo

uvx synth-ai demo --force

Choose Crafter GRPO (local FastAPI) when prompted. The CLI materialises the project (task app, configs, helper scripts) and a .env template.

2. Pair the CLI

uvx synth-ai setup

Run this inside the demo directory to populate .env with SYNTH_API_KEY and ENVIRONMENT_API_KEY.

3. Start the task app locally

uvx synth-ai deploy \
  --runtime local \
  --task-app task_app.py \
  --host 0.0.0.0 \
  --port 8001 \
  --env-file .env \
  --trace traces/v3 \
  --trace-db traces/v3/crafter_demo.db

Keep this process running; it captures rollouts and writes them to traces/v3.

4. Collect traced rollouts

In a second shell, run the bundled rollout script:

uv run python run_local_rollout_traced.py

Repeat until you see a non-zero outcome reward in the summary output. Each run appends trajectories to the trace database referenced above.

5. Export JSONL

uv run python export_trace_sft.py \
  --db traces/v3/crafter_demo.db \
  --out ft_data/crafter_traces.jsonl

The script walks the trace database, filters optional achievements, and produces an SFT-ready JSONL file.

6. Launch the SFT job

uvx synth-ai train \
  --config configs/crafter_fft.toml \
  --dataset ft_data/crafter_traces.jsonl \
  --env-file .env

The CLI validates the dataset, uploads it, submits the job, and streams status updates. When the run finishes you receive a checkpoint ID (ft:…) suitable for future RL or eval jobs.

7. Evaluate the checkpoint

uv run python run_eval.py \
  --toml configs/eval_groq_qwen32b.toml \
  --model ft:Qwen/Qwen3-4B:ftjob-XXXX \
  --use-rollout

Swap in the fine-tuned model ID from the previous step to compare performance against the baseline.

Tips

Keep ft_data/ under version control so you can track dataset revisions.
Use uvx synth-ai train --no-poll if you prefer to submit jobs and monitor them later with synth-ai status jobs ….
The same trace database can feed multiple filtered JSONL exports—experiment with different achievement filters before retraining.

Get Started

Fine-Tuning

Reinforcement Learning

CLI Commands

1. Initialise the demo

2. Pair the CLI

3. Start the task app locally

4. Collect traced rollouts

5. Export JSONL

6. Launch the SFT job

7. Evaluate the checkpoint

Tips

Get Started

Fine-Tuning

Reinforcement Learning

CLI Commands

​1. Initialise the demo

​2. Pair the CLI

​3. Start the task app locally

​4. Collect traced rollouts

​5. Export JSONL

​6. Launch the SFT job

​7. Evaluate the checkpoint

​Tips

1. Initialise the demo

2. Pair the CLI

3. Start the task app locally

4. Collect traced rollouts

5. Export JSONL

6. Launch the SFT job

7. Evaluate the checkpoint

Tips