Skip to main content
This example turns the Crafter demo into a rejection-finetuning loop. You will:
  1. Run an RL job with tracing enabled on your hosted task app.
  2. Convert the resulting trace database into a curated JSONL dataset.
  3. Submit a fine-tuning job through uvx synth-ai train --type sft.
  4. Evaluate the produced checkpoint back on the same task app.

Prerequisites

  • You have followed the Get Started guide and deployed the Crafter task app to Modal.
  • Tracing is enabled on the deployment (set TASKAPP_TRACING_ENABLED=1 when deploying or mount a trace volume for the task app).
  • Your .env contains SYNTH_API_KEY, ENVIRONMENT_API_KEY, and TASK_APP_URL.

1. Run a traced RL job

uvx synth-ai train \
  --type rl \
  --config examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml
Make sure your deployment stores traces (for example by setting SQLD_DB_PATH in Modal). After the job completes, download the SQLite trace database from your trace volume.

2. Filter traces into JSONL

uv run python examples/warming_up_to_rl/export_trace_sft.py \
  --db traces/v3/synth_ai.db \
  --out datasets/crafter_reject_sft.jsonl \
  --require-achievement collect_wood
Repeat --require-achievement for each outcome you want to retain. The JSONL output follows Synth’s SFT schema and is ready for upload.

3. Launch the SFT job

uvx synth-ai train \
  --type sft \
  --config examples/warming_up_to_rl/configs/crafter_fft.toml \
  --dataset datasets/crafter_reject_sft.jsonl
The CLI validates the dataset, uploads it, and polls the job until it finishes. The response includes a fine-tuned model ID such as ft:Qwen/Qwen3-4B:ftjob-….

4. Evaluate the tuned checkpoint

Copy examples/warming_up_to_rl/configs/eval_groq_qwen32b.toml, replace the model entry with your fine-tuned model ID, and run:
uv run python examples/warming_up_to_rl/run_eval.py \
  --toml path/to/your_eval_config.toml \
  --use-rollout
Compare metrics and achievements against the baseline run to confirm improvements.

Tips

  • Maintain a versioned datasets/ directory—uvx synth-ai train --type sft will auto-suggest recent files.
  • Use --dry-run on both SFT and RL commands to inspect payloads before launching production jobs.
  • Once satisfied with the tuned checkpoint, set [model].source in your RL configs so future runs resume from it.
This cycle—RL for exploration, trace filtering, SFT for refinement—forms the backbone of Synth’s iteration workflow.
I