Skip to main content
This workflow turns RL experience into higher-quality supervised data. The scripts live under examples/warming_up_to_rl/ in the SDK.

1. Run a traced RL job

uvx synth-ai train \
  --config examples/warming_up_to_rl/configs/rl_from_base_qwen4b.toml \
  --env-file .env
Ensure tracing is enabled on your task app (mount a trace volume or set TASKAPP_TRACING_ENABLED=1). When the job completes, download the trace database (Turso/libSQL or the .db file) referenced by the deployment.

2. Filter traces into JSONL

uv run python examples/warming_up_to_rl/export_trace_sft.py \
  --db traces/v3/synth_ai.db \
  --out datasets/crafter_reject_sft.jsonl \
  --require-achievement collect_wood \
  --require-achievement craft_table
Repeat --require-achievement for each condition you want to enforce. The exporter writes JSONL in Synth’s SFT schema.

3. Launch the SFT job

uvx synth-ai train \
  --config examples/warming_up_to_rl/configs/crafter_fft.toml \
  --dataset datasets/crafter_reject_sft.jsonl \
  --env-file .env
The CLI uploads the dataset, submits the job, and streams progress. Copy the resulting fine-tuned model ID for the next step.

4. Evaluate the tuned checkpoint

uv run python examples/warming_up_to_rl/run_eval.py \
  --toml examples/warming_up_to_rl/configs/eval_groq_qwen32b.toml \
  --model ft:Qwen/Qwen3-4B:ftjob-XXXX \
  --use-rollout
Swap the model field for your fine-tuned checkpoint to compare metrics against the baseline.

Tips

  • Use uvx synth-ai status jobs list --status succeeded --json to pull job IDs for audit trails.
  • Keep datasets versioned; each tweak to the filtering criteria should produce a new JSONL for reproducibility.
  • Once you settle on a fine-tuned checkpoint, reference it under [model].source in your RL configs so future runs start from the refined policy.