Rejection Finetuning Demo

This example turns the Crafter demo into a rejection-finetuning loop. You will:

Run an RL job with tracing enabled on your hosted task app.
Convert the resulting trace database into a curated JSONL dataset.
Submit a fine-tuning job through uvx synth-ai train --type sft.
Evaluate the produced checkpoint back on the same task app.

Prerequisites

You have followed the Get Started guide and deployed the Crafter task app to Modal.
Tracing is enabled on the deployment (set TASKAPP_TRACING_ENABLED=1 when deploying or mount a trace volume for the task app).
Your .env contains SYNTH_API_KEY, ENVIRONMENT_API_KEY, and TASK_APP_URL.

1. Run a traced RL job

uvx synth-ai train \
  --type rl \
  --config examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml

Make sure your deployment stores traces (for example by setting SQLD_DB_PATH in Modal). After the job completes, download the SQLite trace database from your trace volume.

2. Filter traces into JSONL

uv run python examples/warming_up_to_rl/export_trace_sft.py \
  --db traces/v3/synth_ai.db \
  --out datasets/crafter_reject_sft.jsonl \
  --require-achievement collect_wood

Repeat --require-achievement for each outcome you want to retain. The JSONL output follows Synth’s SFT schema and is ready for upload.

3. Launch the SFT job

uvx synth-ai train \
  --type sft \
  --config examples/warming_up_to_rl/configs/crafter_fft.toml \
  --dataset datasets/crafter_reject_sft.jsonl

The CLI validates the dataset, uploads it, and polls the job until it finishes. The response includes a fine-tuned model ID such as ft:Qwen/Qwen3-4B:ftjob-….

4. Evaluate the tuned checkpoint

Copy examples/warming_up_to_rl/configs/eval_groq_qwen32b.toml, replace the model entry with your fine-tuned model ID, and run:

uv run python examples/warming_up_to_rl/run_eval.py \
  --toml path/to/your_eval_config.toml \
  --use-rollout

Compare metrics and achievements against the baseline run to confirm improvements.

Tips

Maintain a versioned datasets/ directory—uvx synth-ai train --type sft will auto-suggest recent files.
Use --dry-run on both SFT and RL commands to inspect payloads before launching production jobs.
Once satisfied with the tuned checkpoint, set [model].source in your RL configs so future runs resume from it.

This cycle—RL for exploration, trace filtering, SFT for refinement—forms the backbone of Synth’s iteration workflow.

SDK

​Prerequisites

​1. Run a traced RL job

​2. Filter traces into JSONL

​3. Launch the SFT job

​4. Evaluate the tuned checkpoint