Skip to main content
SFT (Supervised Fine-Tuning) trains model weights to clone behavior from demonstration data.

1. Install Demo

uvx synth-ai demo sft
cd demo_sft
Creates task_app.py (Crafter task), train_cfg.toml (config), sample JSONL data.

2. Setup Credentials

uvx synth-ai setup
Opens browser to fetch SYNTH_API_KEY and ENVIRONMENT_API_KEY. Saves to .env.

3. Prepare Data

SFT requires JSONL with chat messages:
{"messages": [{"role": "user", "content": "What is 2+2?"}, {"role": "assistant", "content": "4"}]}
{"messages": [{"role": "user", "content": "Capital of France?"}, {"role": "assistant", "content": "Paris"}]}
Or collect from task app:
# Deploy task app with tracing
uvx synth-ai deploy tunnel task_app.py --env .env

# Run evaluations to collect traces
uvx synth-ai eval my-task --url http://127.0.0.1:8001 --trace-db traces.sqlite --seeds 0-99

# Export successful traces to JSONL
uvx synth-ai filter --db traces.sqlite --output train.jsonl --min-score 0.5

4. Train

uvx synth-ai train --type sft --config train_cfg.toml --dataset train.jsonl --poll
Streams training progress. Runtime: 10-60 min depending on model/data size.

Minimal Config

[algorithm]
type = "offline"
method = "sft"
variety = "lora"           # or "fft" for full fine-tune

[job]
model = "Qwen/Qwen3-4B"
data = "train.jsonl"

[compute]
gpu_type = "H100"
gpu_count = 2

[hyperparameters]
n_epochs = 1
global_batch = 8
per_device_batch = 2
learning_rate = 5e-6
sequence_length = 2048
train_kind = "peft"        # or "fft" for full

[training.lora]
r = 16
alpha = 32
dropout = 0.1
target_modules = ["q_proj", "v_proj"]

Key Parameters

ParameterPurpose
algorithm.variety"lora", "qlora", or "fft"
hyperparameters.n_epochsTraining epochs
hyperparameters.global_batchTotal batch size
hyperparameters.learning_rateLearning rate
training.lora.rLoRA rank (higher = more capacity)

Get Results

from synth_ai.sdk.api.train.sft import SFTJob

job = SFTJob.from_config("train_cfg.toml")
job.submit()
result = job.poll_until_complete()

model_id = result.get("fine_tuned_model")
# e.g., "ft:Qwen/Qwen3-4B:job_abc123"
Use trained model:
# Dev inference
curl -X POST https://agent-learning.onrender.com/api/inference/chat \
  -H "Authorization: Bearer $SYNTH_API_KEY" \
  -d '{"model": "ft:Qwen/Qwen3-4B:job_abc123", "messages": [...]}'

# Export to HuggingFace
uvx synth-ai artifacts export ft:Qwen/Qwen3-4B:job_abc123 --repo-id myorg/model

Self-Training Loop

Combine SFT with RL for iterative improvement:
  1. Train initial model with SFT on seed data
  2. Run RL to improve from environment feedback
  3. Filter successful RL trajectories
  4. SFT on combined data
  5. Repeat