Crafter environment
This walkthrough mirrors the v1 example under synth-ai/examples/finetuning/synth_qwen_v1/ in the repo synth-laboratories/synth-ai. Requirements
  • Have uv installed and use uvx/uv run
  • SYNTH_API_KEY exported in your shell
  • Local tracing and environment service deployed with uvx synth-ai serve
What this demo shows (current v1 flow)
  • End-to-end flow: generate v3 traces → filter to SFT JSONL → create/run SFT job → use the fine‑tuned adapter
  • Uses Qwen/Qwen3-0.6B or Qwen/Qwen3-4B-Instruct-2507 with tool-calling in Crafter
  • v1 scripts live under examples/finetuning/synth_qwen_v1/ and call the backend directly

Overview: ReAct agent + tool-calling in Crafter (v1)

  • Agent loop: A ReAct-style LLM agent runs inside the Crafter environment. Each turn the model thinks in text and issues a structured tool call (OpenAI functions) to act in the world.
  • Tool-calling: We send OpenAI-compatible messages plus function tools (e.g., step/look). For Qwen3 we use its native chat template and support tool_choice and stop_after_tool_calls to ensure a clean, single action per turn.
  • API usage (v1):
    • Rollouts use Synth inference (OpenAI-compatible) to generate traces with Qwen.
    • Traces are filtered to OpenAI-format SFT JSONL.
    • SFT is kicked off via the backend; returns an id like ft:Qwen/Qwen3-0.6B:ftjob-<uuid>.
  • Observability: Full tracing (SQLite/Turso) captures sessions, tool calls, rewards, and tokens for analysis and dataset creation.
Quick setup (v1)
uvx synth-ai serve  # optional, for local tracing

# Auth (prod)
export SYNTH_API_KEY="$SYNTH_API_KEY"

# Optional: copy example env and adjust
cp synth-ai/examples/finetuning/synth_qwen/.env.example synth-ai/examples/finetuning/synth_qwen/.env
  1. Generate traces (Qwen 0.6B by default)
set -a; MONOREPO_BACKEND=${MONOREPO_BACKEND:-../monorepo/backend}; source "$MONOREPO_BACKEND/.env.dev"; set +a; \
export SYNTH_BASE_URL="$(uv run python -c 'from examples.common.backend import resolve_backend_url;print(resolve_backend_url())')"; \
export SYNTH_API_KEY="${DEV_SYNTH_API_KEY:-${SYNTH_API_KEY:-${SYNTH_API_KEY_TEST:-sk-local}}}"; \
uv run python synth-ai/examples/finetuning/synth_qwen_v1/react_agent_lm.py --model "Qwen/Qwen3-0.6B" --episodes 10 --max-steps 10 --quiet --no-daemon
Example output (abridged)
✅ Crafter service is healthy
Running 10 episodes (concurrency=5)...
✅ Completed 10 episodes in ~366s
📊 EVALUATION RESULTS
Episodes completed: 10/10
Average reward per episode: 1.10
Average steps per episode: 87.00
💾 Results: traces/synth_ai.db
  1. Filter traces → SFT JSONL (v1 helpers)
Option A (generic thresholds)
uv run python synth-ai/examples/finetuning/synth_qwen_v1/filter_traces.py
Option B (require achievements)
uv run python synth-ai/examples/finetuning/synth_qwen_v1/filter_traces_achievements.py
Example output
Using database: sqlite+aiosqlite:///$PWD/traces/synth_ai.db/dbs/default/data
Output file: ft_data/qwen4b_crafter_sft_collect_wood.jsonl
✅ Wrote 13 examples from 13 sessions
  1. Finetune (SFT)
set -a; MONOREPO_BACKEND=${MONOREPO_BACKEND:-../monorepo/backend}; source "$MONOREPO_BACKEND/.env.dev"; set +a; \
SYNTH_BACKEND_URL_OVERRIDE=prod \
DEV_BACKEND_URL="$(uv run python -c 'from examples.common.backend import resolve_backend_url;print(resolve_backend_url())')" \
uv run python synth-ai/examples/finetuning/synth_qwen_v1/run_ft_job.py --mode dev
Example output (abridged)
🚀 Starting Qwen 4B SFT
⏳ poll ...
🟢 Qwen4B SFT fine-tune succeeded → ft:Qwen/Qwen3-4B-Instruct-2507:ftjob-6cedf721e0ca4c80968834b71e2bdace
  1. Evaluate the fine-tuned adapter
CRAFTER_MODEL="$(jq -r .fine_tuned_model synth-ai/examples/finetuning/synth_qwen_v1/state.json)" \
uv run python synth-ai/examples/finetuning/synth_qwen_v1/react_agent_lm.py --model "$CRAFTER_MODEL" --episodes 5 --max-steps 10 --quiet --no-daemon --no-traces
Example output (abridged)
✅ Model warmed up successfully!
Running 5 episodes (concurrency=5)...
✅ Completed 5 episodes in 58s
📊 EVALUATION RESULTS
Average reward per episode: 0.60
💾 Results: traces/synth_ai.db
Inspecting traces
uvx synth-ai traces