synth-ai train CLI walks you through every stage of launching a training job—from picking the right TOML config to streaming live results. This page explains what you should prepare beforehand and how the three supported flows (Reinforcement Learning, Supervised Fine Tuning, Prompt Learning) differ in day-to-day usage.
When to Use Each Flow
| Flow | Best For | Key Artifacts You Provide |
|---|---|---|
| Reinforcement Learning (RL) | Live task apps where the model interacts with an environment | Running task app URL, environment API key, RL config |
| Supervised Fine Tuning (SFT) | Offline tuning from labeled conversational JSONL data | Train/validation JSONL files, SFT config |
| Prompt Learning (GEPA/MIPRO) | Rapid prompt iteration with automatic scoring | Task app URL, environment API key, prompt-learning config |
Common Prerequisites
- Synth backend access:
SYNTH_API_KEYmust be present in your.envor shell. - Training config: One or more
.tomlfiles describing the run; pass them with--configor let the CLI discover them. .envfile: Store secrets locally so the resolver can auto-fill prompts; use--env-file path/.envto skip the picker.- Updated CLI: Run under Python 3.11+ with
uvx synth-ai trainto ensure dependency parity.
Typical Workflow
- Select configs:
synth-ai train --config path/to/rl.toml(repeatable); omit--configto pick interactively. - Resolve secrets: The CLI reads your
.env, fetches Modal secrets if allowed, and masks values in the log. - Confirm overrides: Optional flags such as
--backend,--model,--dataset,--examples, or--task-urltailor the run. - Review payload preview: Before submitting, the CLI prints the synthesized payload so you can double-check models, datasets, and IDs.
- Watch progress: Leave
--pollenabled to stream status via the CLI or choose--stream-format chartfor a live loss/score panel. - Collect artifacts: Prompt-learning runs produce a Markdown summary; SFT uploads remain in
/files; RL jobs stream the final status JSON.
Choosing a Flow
- Start with RL if your task app exposes interactive endpoints and you want policy-gradient style updates.
- Use SFT when you already have labeled JSONL conversations and just need to fine-tune weights offline.
- Pick Prompt Learning for GEPA or MIPRO optimization when you want faster iteration without full training cycles.