Train

Submit supervised fine-tuning (SFT) or reinforcement learning (RL) training jobs to the Synth platform. The CLI handles dataset upload, config validation, job submission, and live progress streaming.

Usage

# Auto-discover config
uvx synth-ai train

# Specify config explicitly
uvx synth-ai train --config configs/rl.toml

# Dry-run (preview without submitting)
uvx synth-ai train --config sft.toml --dry-run

Quick Start

SFT Training

Create an SFT config (sft.toml):

type = "sft"

[model]
base = "Qwen/Qwen3-4B"
label = "my-sft-model"

[training]
batch_size = 8
learning_rate = 1e-5
num_train_epochs = 3

Run:

uvx synth-ai train --config sft.toml --dataset datasets/train.jsonl

RL Training

Create an RL config (rl.toml):

type = "rl"

[algorithm]
type = "online"
method = "policy_gradient"
variety = "gspo"

[policy]
model_name = "Qwen/Qwen3-4B"
label = "my-rl-model"

[services]
task_url = "https://my-task-app.modal.run"

Run:

uvx synth-ai train --config rl.toml --task-url https://my-task-app.modal.run

Configuration Types

The CLI detects training type from the config:

SFT Config Structure

type = "sft"

[model]
base = "Qwen/Qwen3-4B"           # Base model to fine-tune
label = "my-model-v1"              # Human-readable label

[training]
batch_size = 8
learning_rate = 1e-5
num_train_epochs = 3
warmup_steps = 100
gradient_accumulation_steps = 4

[lora]  # Optional: LoRA fine-tuning
r = 64
alpha = 16
dropout = 0.05
target_modules = ["q_proj", "v_proj"]

RL Config Structure

type = "rl"

[algorithm]
type = "online"                    # "online" or "offline"
method = "policy_gradient"
variety = "gspo"                   # "gspo", "ppo", etc.

[policy]
model_name = "Qwen/Qwen3-4B"
label = "my-rl-policy"
trainer_mode = "full"              # "full" or "lora"

[services]
task_url = "https://my-task-app.modal.run"

[training]
num_episodes = 1000
batch_size = 32
learning_rate = 1e-5

CLI Options

--config PATH              # TOML config file (auto-discovered if omitted)
--type {auto,sft,rl}       # Override auto-detection
--dataset PATH             # SFT dataset JSONL (overrides config)
--task-url URL             # RL task app URL (overrides config)
--dry-run                  # Preview payload without submitting
--no-poll                  # Submit and exit (don't wait for completion)
--stream-format {default,chart}  # Output format

Examples

SFT with Custom Dataset

uvx synth-ai train \
  --config sft.toml \
  --dataset datasets/my_data.jsonl

RL with Task App Override

uvx synth-ai train \
  --config rl.toml \
  --task-url https://my-app-v2.modal.run

Dry Run (Preview)

# Preview the payload without submitting
uvx synth-ai train --config sft.toml --dry-run

Submit Without Polling

# Submit and exit immediately (check status later)
uvx synth-ai train --config rl.toml --no-poll

Chart Mode (Loss Curve)

# Display live training loss curve
uvx synth-ai train --config sft.toml --stream-format chart

Output

The command streams live job events:

✓ Config validated
✓ Dataset uploaded: datasets/train.jsonl (1,234 examples)
✓ Job submitted: job_abc123

Streaming job progress...

[10:30:15] sft.training.started
[10:30:20] sft.progress          epoch=1/3 step=10 loss=2.45
[10:35:10] sft.progress          epoch=1/3 step=100 loss=1.23
[10:40:00] sft.validation        val_loss=1.15
[10:55:00] sft.training.finish   final_loss=0.89

✓ Training complete!
  Fine-tuned model: ft:org-abc:job-abc123
  Dashboard: https://www.usesynth.ai/jobs/abc123

Validation

The CLI performs pre-flight checks:

SFT Validation

JSONL format correctness
Required fields (messages)
Message roles (user, assistant)
File size limits

RL Validation

Task app reachability (GET /health)
Task app authentication
Model compatibility
GPU requirements

Troubleshooting

”No config files found”

Create a TOML file with type = "sft" or type = "rl"
Or specify --config path/to/config.toml

”Dataset file not found” (SFT)

Verify --dataset path is correct

Or set dataset in your SFT config:

[data]
dataset = "datasets/train.jsonl"

“Task app unreachable” (RL)

Verify task app is deployed and running
Check task_url in config
Test manually: curl https://your-app.modal.run/health

”Authentication failed”

Run uvx synth-ai setup to configure API keys
Verify SYNTH_API_KEY is set
Check API key permissions on dashboard

”Validation error: Invalid model”

Check model name matches Synth supported models
See Models for supported base models
Verify spelling and organization prefix

”Insufficient credits”

Training jobs require credits
Check dashboard: https://www.usesynth.ai/dashboard
Contact support for credit allocation

Job Management

After submission, manage jobs via:

Dashboard: https://www.usesynth.ai/jobs
Status command: uvx synth-ai status jobs --limit 10
API: Direct REST calls to backend

To check job status later:

uvx synth-ai status jobs --job-id abc123

Get Started

CLI Commands

Fine-Tuning

Reinforcement Learning

CLI Commands

Usage

Quick Start

SFT Training

RL Training

Configuration Types

SFT Config Structure

RL Config Structure

CLI Options

Examples

SFT with Custom Dataset

RL with Task App Override

Dry Run (Preview)

Submit Without Polling

Chart Mode (Loss Curve)

Output

Validation

SFT Validation

RL Validation

Troubleshooting

”No config files found”

”Dataset file not found” (SFT)

“Task app unreachable” (RL)

”Authentication failed”

”Validation error: Invalid model”

”Insufficient credits”

Job Management

Best Practices

SFT

RL

Cost Optimization

Next Steps

Get Started

CLI Commands

Fine-Tuning

Reinforcement Learning

CLI Commands

​Usage

​Quick Start

​SFT Training

​RL Training

​Configuration Types

​SFT Config Structure

​RL Config Structure

​CLI Options

​Examples

​SFT with Custom Dataset

​RL with Task App Override

​Dry Run (Preview)

​Submit Without Polling

​Chart Mode (Loss Curve)

​Output

​Validation

​SFT Validation

​RL Validation

​Troubleshooting

​”No config files found”

​”Dataset file not found” (SFT)

​“Task app unreachable” (RL)

​”Authentication failed”

​”Validation error: Invalid model”

​”Insufficient credits”

​Job Management

​Best Practices

​SFT

​RL

​Cost Optimization

​Next Steps

Usage

Quick Start

SFT Training

RL Training

Configuration Types

SFT Config Structure

RL Config Structure

CLI Options

Examples

SFT with Custom Dataset

RL with Task App Override

Dry Run (Preview)

Submit Without Polling

Chart Mode (Loss Curve)

Output

Validation

SFT Validation

RL Validation

Troubleshooting

”No config files found”

”Dataset file not found” (SFT)

“Task app unreachable” (RL)

”Authentication failed”

”Validation error: Invalid model”

”Insufficient credits”

Job Management

Best Practices

SFT

RL

Cost Optimization

Next Steps