Skip to main content
Submit supervised fine-tuning (SFT) or reinforcement learning (RL) training jobs to the Synth platform. The CLI handles dataset upload, config validation, job submission, and live progress streaming.

Usage

# Auto-discover config
uvx synth-ai train

# Specify config explicitly
uvx synth-ai train --config configs/rl.toml

# Dry-run (preview without submitting)
uvx synth-ai train --config sft.toml --dry-run

Quick Start

SFT Training

Create an SFT config (sft.toml):
type = "sft"

[model]
base = "Qwen/Qwen3-4B"
label = "my-sft-model"

[training]
batch_size = 8
learning_rate = 1e-5
num_train_epochs = 3
Run:
uvx synth-ai train --config sft.toml --dataset datasets/train.jsonl

RL Training

Create an RL config (rl.toml):
type = "rl"

[algorithm]
type = "online"
method = "policy_gradient"
variety = "gspo"

[policy]
model_name = "Qwen/Qwen3-4B"
label = "my-rl-model"

[services]
task_url = "https://my-task-app.modal.run"
Run:
uvx synth-ai train --config rl.toml --task-url https://my-task-app.modal.run

Configuration Types

The CLI detects training type from the config:

SFT Config Structure

type = "sft"

[model]
base = "Qwen/Qwen3-4B"           # Base model to fine-tune
label = "my-model-v1"              # Human-readable label

[training]
batch_size = 8
learning_rate = 1e-5
num_train_epochs = 3
warmup_steps = 100
gradient_accumulation_steps = 4

[lora]  # Optional: LoRA fine-tuning
r = 64
alpha = 16
dropout = 0.05
target_modules = ["q_proj", "v_proj"]

RL Config Structure

type = "rl"

[algorithm]
type = "online"                    # "online" or "offline"
method = "policy_gradient"
variety = "gspo"                   # "gspo", "ppo", etc.

[policy]
model_name = "Qwen/Qwen3-4B"
label = "my-rl-policy"
trainer_mode = "full"              # "full" or "lora"

[services]
task_url = "https://my-task-app.modal.run"

[training]
num_episodes = 1000
batch_size = 32
learning_rate = 1e-5

CLI Options

--config PATH              # TOML config file (auto-discovered if omitted)
--type {auto,sft,rl}       # Override auto-detection
--dataset PATH             # SFT dataset JSONL (overrides config)
--task-url URL             # RL task app URL (overrides config)
--dry-run                  # Preview payload without submitting
--no-poll                  # Submit and exit (don't wait for completion)
--stream-format {default,chart}  # Output format

Examples

SFT with Custom Dataset

uvx synth-ai train \
  --config sft.toml \
  --dataset datasets/my_data.jsonl

RL with Task App Override

uvx synth-ai train \
  --config rl.toml \
  --task-url https://my-app-v2.modal.run

Dry Run (Preview)

# Preview the payload without submitting
uvx synth-ai train --config sft.toml --dry-run

Submit Without Polling

# Submit and exit immediately (check status later)
uvx synth-ai train --config rl.toml --no-poll

Chart Mode (Loss Curve)

# Display live training loss curve
uvx synth-ai train --config sft.toml --stream-format chart

Output

The command streams live job events:
✓ Config validated
✓ Dataset uploaded: datasets/train.jsonl (1,234 examples)
✓ Job submitted: job_abc123

Streaming job progress...

[10:30:15] sft.training.started
[10:30:20] sft.progress          epoch=1/3 step=10 loss=2.45
[10:35:10] sft.progress          epoch=1/3 step=100 loss=1.23
[10:40:00] sft.validation        val_loss=1.15
[10:55:00] sft.training.finish   final_loss=0.89

✓ Training complete!
  Fine-tuned model: ft:org-abc:job-abc123
  Dashboard: https://www.usesynth.ai/jobs/abc123

Validation

The CLI performs pre-flight checks:

SFT Validation

  • JSONL format correctness
  • Required fields (messages)
  • Message roles (user, assistant)
  • File size limits

RL Validation

  • Task app reachability (GET /health)
  • Task app authentication
  • Model compatibility
  • GPU requirements

Troubleshooting

”No config files found”

  • Create a TOML file with type = "sft" or type = "rl"
  • Or specify --config path/to/config.toml

”Dataset file not found” (SFT)

  • Verify --dataset path is correct
  • Or set dataset in your SFT config:
    [data]
    dataset = "datasets/train.jsonl"
    

“Task app unreachable” (RL)

  • Verify task app is deployed and running
  • Check task_url in config
  • Test manually: curl https://your-app.modal.run/health

”Authentication failed”

  • Run uvx synth-ai setup to configure API keys
  • Verify SYNTH_API_KEY is set
  • Check API key permissions on dashboard

”Validation error: Invalid model”

  • Check model name matches Synth supported models
  • See Models for supported base models
  • Verify spelling and organization prefix

”Insufficient credits”

Job Management

After submission, manage jobs via: To check job status later:
uvx synth-ai status jobs --job-id abc123

Best Practices

SFT

  • Start with small datasets (100-1000 examples) for faster iteration
  • Use LoRA for large models to reduce training time
  • Validate JSONL format before uploading
  • Monitor val_loss to detect overfitting

RL

  • Test task app locally first (deploy --runtime uvicorn)
  • Run smoke tests before training (uvx synth-ai smoke)
  • Start with short episodes (10-20 steps)
  • Use eval to measure policy improvement

Cost Optimization

  • LoRA vs Full: LoRA is 5-10x cheaper for large models
  • Batch size: Larger batches = faster training but more GPU memory
  • Early stopping: Use validation loss to stop early
  • Dataset size: More data ≠ better; 1K high-quality > 10K noisy

Next Steps