Get Started with Supervised Fine-Tuning

SFT (Supervised Fine-Tuning) trains model weights to clone behavior from demonstration data.

1. Install Demo

uvx synth-ai demo sft
cd demo_sft

Creates task_app.py (Crafter task), train_cfg.toml (config), sample JSONL data.

2. Setup Credentials

uvx synth-ai setup

Opens browser to fetch SYNTH_API_KEY and ENVIRONMENT_API_KEY. Saves to .env.

3. Prepare Data

SFT requires JSONL with chat messages:

{"messages": [{"role": "user", "content": "What is 2+2?"}, {"role": "assistant", "content": "4"}]}
{"messages": [{"role": "user", "content": "Capital of France?"}, {"role": "assistant", "content": "Paris"}]}

Or collect from task app:

# Deploy task app with tracing
uvx synth-ai deploy tunnel task_app.py --env .env

# Run evaluations to collect traces
uvx synth-ai eval my-task --url http://127.0.0.1:8001 --trace-db traces.sqlite --seeds 0-99

# Export successful traces to JSONL
uvx synth-ai filter --db traces.sqlite --output train.jsonl --min-score 0.5

4. Train

uvx synth-ai train --type sft --config train_cfg.toml --dataset train.jsonl --poll

Streams training progress. Runtime: 10-60 min depending on model/data size.

Minimal Config

[algorithm]
type = "offline"
method = "sft"
variety = "lora"           # or "fft" for full fine-tune

[job]
model = "Qwen/Qwen3-4B"
data = "train.jsonl"

[compute]
gpu_type = "H100"
gpu_count = 2

[hyperparameters]
n_epochs = 1
global_batch = 8
per_device_batch = 2
learning_rate = 5e-6
sequence_length = 2048
train_kind = "peft"        # or "fft" for full

[training.lora]
r = 16
alpha = 32
dropout = 0.1
target_modules = ["q_proj", "v_proj"]

Key Parameters

Parameter	Purpose
`algorithm.variety`	`"lora"`, `"qlora"`, or `"fft"`
`hyperparameters.n_epochs`	Training epochs
`hyperparameters.global_batch`	Total batch size
`hyperparameters.learning_rate`	Learning rate
`training.lora.r`	LoRA rank (higher = more capacity)

Get Results

from synth_ai.sdk.api.train.sft import SFTJob

job = SFTJob.from_config("train_cfg.toml")
job.submit()
result = job.poll_until_complete()

model_id = result.get("fine_tuned_model")
# e.g., "ft:Qwen/Qwen3-4B:job_abc123"

Use trained model:

# Dev inference
curl -X POST https://agent-learning.onrender.com/api/inference/chat \
  -H "Authorization: Bearer $SYNTH_API_KEY" \
  -d '{"model": "ft:Qwen/Qwen3-4B:job_abc123", "messages": [...]}'

# Export to HuggingFace
uvx synth-ai artifacts export ft:Qwen/Qwen3-4B:job_abc123 --repo-id myorg/model

Self-Training Loop

Combine SFT with RL for iterative improvement:

Train initial model with SFT on seed data
Run RL to improve from environment feedback
Filter successful RL trajectories
SFT on combined data
Repeat

Get Started

Products

Infrastructure

Supported Models

Pricing

Get Started with Supervised Fine-Tuning

1. Install Demo

2. Setup Credentials

3. Prepare Data

4. Train

Minimal Config

Key Parameters

Get Results

Self-Training Loop

Get Started

Products

Infrastructure

Supported Models

Pricing

​1. Install Demo

​2. Setup Credentials

​3. Prepare Data

​4. Train

​Minimal Config

​Key Parameters

​Get Results

​Self-Training Loop

1. Install Demo

2. Setup Credentials

3. Prepare Data

4. Train

Minimal Config

Key Parameters

Get Results

Self-Training Loop