Skip to main content
The synth-ai train CLI walks you through every stage of launching a training job—from picking the right TOML config to streaming live results. This page explains what you should prepare beforehand and how the three supported flows (Reinforcement Learning, Supervised Fine Tuning, Prompt Learning) differ in day-to-day usage.

When to Use Each Flow

FlowBest ForKey Artifacts You Provide
Reinforcement Learning (RL)Live task apps where the model interacts with an environmentRunning task app URL, environment API key, RL config
Supervised Fine Tuning (SFT)Offline tuning from labeled conversational JSONL dataTrain/validation JSONL files, SFT config
Prompt Learning (GEPA/MIPRO)Rapid prompt iteration with automatic scoringTask app URL, environment API key, prompt-learning config

Common Prerequisites

  • Synth backend access: SYNTH_API_KEY must be present in your .env or shell.
  • Training config: One or more .toml files describing the run; pass them with --config or let the CLI discover them.
  • .env file: Store secrets locally so the resolver can auto-fill prompts; use --env-file path/.env to skip the picker.
  • Updated CLI: Run under Python 3.11+ with uvx synth-ai train to ensure dependency parity.

Typical Workflow

  1. Select configs: synth-ai train --config path/to/rl.toml (repeatable); omit --config to pick interactively.
  2. Resolve secrets: The CLI reads your .env, fetches Modal secrets if allowed, and masks values in the log.
  3. Confirm overrides: Optional flags such as --backend, --model, --dataset, --examples, or --task-url tailor the run.
  4. Review payload preview: Before submitting, the CLI prints the synthesized payload so you can double-check models, datasets, and IDs.
  5. Watch progress: Leave --poll enabled to stream status via the CLI or choose --stream-format chart for a live loss/score panel.
  6. Collect artifacts: Prompt-learning runs produce a Markdown summary; SFT uploads remain in /files; RL jobs stream the final status JSON.

Choosing a Flow

  • Start with RL if your task app exposes interactive endpoints and you want policy-gradient style updates.
  • Use SFT when you already have labeled JSONL conversations and just need to fine-tune weights offline.
  • Pick Prompt Learning for GEPA or MIPRO optimization when you want faster iteration without full training cycles.
Each linked page dives into the practical steps that matter to you as a user: required inputs, exact CLI commands, what the prompts look like, and how to interpret the streaming output.