Usage
Quick Start
SFT Training
Create an SFT config (sft.toml):
RL Training
Create an RL config (rl.toml):
Configuration Types
The CLI detects training type from the config:SFT Config Structure
RL Config Structure
CLI Options
Examples
SFT with Custom Dataset
RL with Task App Override
Dry Run (Preview)
Submit Without Polling
Chart Mode (Loss Curve)
Output
The command streams live job events:Validation
The CLI performs pre-flight checks:SFT Validation
- JSONL format correctness
- Required fields (
messages) - Message roles (
user,assistant) - File size limits
RL Validation
- Task app reachability (
GET /health) - Task app authentication
- Model compatibility
- GPU requirements
Troubleshooting
”No config files found”
- Create a TOML file with
type = "sft"ortype = "rl" - Or specify
--config path/to/config.toml
”Dataset file not found” (SFT)
- Verify
--datasetpath is correct - Or set
datasetin your SFT config:
“Task app unreachable” (RL)
- Verify task app is deployed and running
- Check
task_urlin config - Test manually:
curl https://your-app.modal.run/health
”Authentication failed”
- Run
uvx synth-ai setupto configure API keys - Verify
SYNTH_API_KEYis set - Check API key permissions on dashboard
”Validation error: Invalid model”
- Check model name matches Synth supported models
- See Models for supported base models
- Verify spelling and organization prefix
”Insufficient credits”
- Training jobs require credits
- Check dashboard: https://www.usesynth.ai/dashboard
- Contact support for credit allocation
Job Management
After submission, manage jobs via:- Dashboard: https://www.usesynth.ai/jobs
- Status command:
uvx synth-ai status jobs --limit 10 - API: Direct REST calls to backend
Best Practices
SFT
- Start with small datasets (100-1000 examples) for faster iteration
- Use LoRA for large models to reduce training time
- Validate JSONL format before uploading
- Monitor val_loss to detect overfitting
RL
- Test task app locally first (
deploy --runtime uvicorn) - Run smoke tests before training (
uvx synth-ai smoke) - Start with short episodes (10-20 steps)
- Use eval to measure policy improvement
Cost Optimization
- LoRA vs Full: LoRA is 5-10x cheaper for large models
- Batch size: Larger batches = faster training but more GPU memory
- Early stopping: Use validation loss to stop early
- Dataset size: More data ≠ better; 1K high-quality > 10K noisy