Crafter environment
This walkthrough mirrors the example under synth-ai/examples/finetuning/synth_qwen/. Requirements
  • Have uv installed and use uvx/uv run
  • SYNTH_API_KEY exported in your shell
  • Local tracing and environment service deployed with uvx synth-ai serve
What this demo shows
  • End-to-end flow in four steps: Generate traces → Filter to SFT JSONL → Kick off SFT → Run fine-tuned model
  • Uses Qwen/Qwen3-4B-Instruct-2507 with tool-calling in a Crafter environment
  • Central configuration via examples/finetuning/synth_qwen/config.toml

Overview: ReAct agent + tool-calling in Crafter

  • Agent loop: A ReAct-style LLM agent runs inside the Crafter environment. Each turn the model thinks in text and issues a structured tool call (OpenAI functions) to act in the world.
  • Tool-calling: We send OpenAI-compatible messages plus function tools (e.g., step/look). For Qwen3 we use its native chat template and support tool_choice and stop_after_tool_calls to ensure a clean, single action per turn.
  • API usage:
    • Initial rollouts use a dev-only instance of Qwen/Qwen3-4B-Instruct-2507 via the Synth inference API to generate traces.
    • We filter those traces into an OpenAI-format SFT JSONL and kick off fine-tuning through the same Synth API.
    • Fine-tuning returns a model id like ft:Qwen/Qwen3-4B-Instruct-2507:ftjob-<full-uuid>, which we then use for inference in Crafter.
  • Observability: Full tracing (SQLite/Turso) captures sessions, tool calls, rewards, and tokens for analysis and dataset creation.
Quick setup
uvx synth-ai serve  # optional, for local tracing

# Auth (prod)
export SYNTH_API_KEY="$SYNTH_API_KEY"

# Optional: copy example env and adjust
cp synth-ai/examples/finetuning/synth_qwen/.env.example synth-ai/examples/finetuning/synth_qwen/.env
  1. Generate traces (Qwen 4B)
uvpm examples.finetuning.synth_qwen.run_crafter_qwen4b
Example output (abridged)
✅ Crafter service is healthy
Running 10 episodes (concurrency=5)...
✅ Completed 10 episodes in ~366s
📊 EVALUATION RESULTS
Episodes completed: 10/10
Average reward per episode: 1.10
Average steps per episode: 87.00
💾 Results: traces/synth_ai.db
  1. Filter traces → SFT JSONL
Option A (generic thresholds)
uvpm examples.finetuning.synth_qwen.filter_traces
Option B (require achievements)
uvpm examples.finetuning.synth_qwen.filter_traces_achievements
Example output
Using database: sqlite+aiosqlite:///$PWD/traces/synth_ai.db/dbs/default/data
Output file: ft_data/qwen4b_crafter_sft_collect_wood.jsonl
✅ Wrote 13 examples from 13 sessions
  1. Finetune (SFT)
uvpm examples.finetuning.synth_qwen.sft_kickoff
Example output (abridged)
🚀 Starting Qwen 4B SFT
⏳ poll ...
🟢 Qwen4B SFT fine-tune succeeded → ft:Qwen/Qwen3-4B-Instruct-2507:ftjob-6cedf721e0ca4c80968834b71e2bdace
  1. Evaluate the fine-tuned adapter
CRAFTER_MODEL="ft:Qwen/Qwen3-4B-Instruct-2507:ftjob-6cedf721e0ca4c80968834b71e2bdace" \
uvpm examples.finetuning.synth_qwen.run_crafter_qwen4b
Example output (abridged)
✅ Model warmed up successfully!
Running 5 episodes (concurrency=5)...
✅ Completed 5 episodes in 58s
📊 EVALUATION RESULTS
Average reward per episode: 0.60
💾 Results: traces/synth_ai.db
Inspecting traces
uvx synth-ai traces