Skip to main content

What You’ll need

  1. A Synth account — create one with https://usesynth.ai/signup
  2. An API key for Groq

Step 1: Set up the demo

uvx synth-ai demo
First, choose your demo — select option 2, Crafter GRPO
Select a demo template:
  [1] Math Single-Step (Modal deployment) (math-modal)
      Packaged modal task app matching examples/rl math environment.
  [2] Crafter GRPO (local FastAPI) (crafter-local)
      Lightweight wrapper around examples/warming_up_to_rl/task_app/grpo_crafter for local experimentation.
  Enter choice [1-2] (default 1): 2
Once a demo is chosen, its necessary files will be saved to your current working directory
Destination directory [/Users/jacob-roddy-beck/sft/crafter_demo]: 
Demo template 'Crafter GRPO (local FastAPI)' materialised at /Users/jacob-roddy-beck/sft/crafter_demo

Step 2: Fetch your Synth credentials

Automatically connect to your Synth account to fetch and locally store the credentials needed for this demo
uvx synth-ai setup
These credentials are saved to your demo’s .env, which was generated via uvx synth-ai demo, for use in next steps
Keys saved to: /Users/jacob-roddy-beck/sft/crafter_demo/.env

Step 3: Run your local demo server

Create and run the local server for your rollout to hit
uvx synth-ai serve
First, enable tracing
Enable tracing? [Y/n]: Y (default)
Trace directory [traces/v3]: traces/v3 (default)
Trace DB path [traces/v3/synth_ai.db]: traces/v3/synth_ai.db (default)
Then, select your task app:
Select a task app:
[1] grpo-crafter-demo (discovered) – TaskAppConfig in task_app.py (line 56)
...
Enter choice [1]: 1 (default)
Finally, input your Groq API key

Step 4: Run and collect traced rollouts

In another terminal, separate from the server started via uvx synth-ai serve, run:
cd {path to your demo directory}
uv run python run_rollout_traced.py
This command with correct directory is shown in printout from uvx synth-ai serve:
uvx synth-ai serve
...
============================================================
Next step: Collect traced rollouts
============================================================

In another terminal, run:
cd /Users/jacob-roddy-beck/sft/crafter_demo
uv run python run_local_rollout_traced.py
Run this until you get a non-zero reward value. For example, outcome reward of 1.0 means good to go:
 Reward summary:
  Environment rewards per step (trajectory): [1.0]
  Environment reward total: 1.000
  Decision rewards:
    turn=0, ach_delta=0, unique_delta=0, achievements=[]
    turn=1, ach_delta=0, unique_delta=0, achievements=[]
    turn=2, ach_delta=0, unique_delta=0, achievements=[]
    turn=3, ach_delta=0, unique_delta=0, achievements=[]
    turn=4, ach_delta=0, unique_delta=0, achievements=[]
    turn=5, ach_delta=0, unique_delta=0, achievements=[]
    turn=6, ach_delta=0, unique_delta=0, achievements=[]
    turn=7, ach_delta=0, unique_delta=0, achievements=[]
    turn=8, ach_delta=0, unique_delta=0, achievements=[]
    turn=9, ach_delta=0, unique_delta=0, achievements=[]
    turn=10, ach_delta=0, unique_delta=0, achievements=[]
    turn=11, ach_delta=0, unique_delta=0, achievements=[]
    turn=12, ach_delta=0, unique_delta=0, achievements=[]
    turn=13, ach_delta=0, unique_delta=0, achievements=[]
    turn=14, ach_delta=0, unique_delta=0, achievements=[]
    turn=15, ach_delta=0, unique_delta=0, achievements=[]
    turn=16, ach_delta=0, unique_delta=0, achievements=[]
    turn=17, ach_delta=0, unique_delta=0, achievements=[]
    turn=18, ach_delta=0, unique_delta=0, achievements=[]
    turn=19, ach_delta=0, unique_delta=0, achievements=[]
  Outcome rewards (episode returns): [1.0]
If reward value is 0, then re-run uv run python run_local_rollout_traced.py until non-zero. Once complete, you can terminate the server created via uvx synth-ai serve

Step 5: Export rollout traces to dataset

In the same directory used in the previous step for uv run python run_local_rollout_traced.py, run:
uv run python export_trace_sft.py
Expected output:
uv run python export_trace_sft.py        
Found trace database: /Users/jacob-roddy-beck/sft/crafter_demo/traces/v3/synth_ai.db
Output will be written to: /Users/jacob-roddy-beck/sft/crafter_demo/ft_data/crafter_traces.jsonl
Minimum unique achievements filter: 0 (all traces)
Wrote 40 examples from 1 session(s) -> /Users/jacob-roddy-beck/sft/crafter_demo/ft_data/crafter_traces.jsonl

Step 6: Train

uvx synth-ai train
Choose the config from which you want to train. This should default to your most recent SFT config:
Select a training config:
  1) [sft] /Users/jacob-roddy-beck/sft/crafter_demo/configs/crafter_fft_4b.toml (last used)
  ...
  0) Abort
  Enter choice [1]: 1
Once selected, your training will kick off

Step 7. View your run

Live status of your run will feed into your dashboard on https://usesynth.ai
I