Train Models with Synth AI

The synth-ai train CLI walks you through every stage of launching a training job—from picking the right TOML config to streaming live results. This page explains what you should prepare beforehand and how the three supported flows (Reinforcement Learning, Supervised Fine Tuning, Prompt Learning) differ in day-to-day usage.

When to Use Each Flow

Flow	Best For	Key Artifacts You Provide
Reinforcement Learning (RL)	Live task apps where the model interacts with an environment	Running task app URL, environment API key, RL config
Supervised Fine Tuning (SFT)	Offline tuning from labeled conversational JSONL data	Train/validation JSONL files, SFT config
Prompt Learning (GEPA/MIPRO)	Rapid prompt iteration with automatic scoring	Task app URL, environment API key, prompt-learning config

Common Prerequisites

Synth backend access: SYNTH_API_KEY must be present in your .env or shell.
Training config: One or more .toml files describing the run; pass them with --config or let the CLI discover them.
.env file: Store secrets locally so the resolver can auto-fill prompts; use --env-file path/.env to skip the picker.
Updated CLI: Run under Python 3.11+ with uvx synth-ai train to ensure dependency parity.

Typical Workflow

Select configs: synth-ai train --config path/to/rl.toml (repeatable); omit --config to pick interactively.
Resolve secrets: The CLI reads your .env, fetches Modal secrets if allowed, and masks values in the log.
Confirm overrides: Optional flags such as --backend, --model, --dataset, --examples, or --task-url tailor the run.
Review payload preview: Before submitting, the CLI prints the synthesized payload so you can double-check models, datasets, and IDs.
Watch progress: Leave --poll enabled to stream status via the CLI or choose --stream-format chart for a live loss/score panel.
Collect artifacts: Prompt-learning runs produce a Markdown summary; SFT uploads remain in /files; RL jobs stream the final status JSON.

Choosing a Flow

Start with RL if your task app exposes interactive endpoints and you want policy-gradient style updates.
Use SFT when you already have labeled JSONL conversations and just need to fine-tune weights offline.
Pick Prompt Learning for GEPA or MIPRO optimization when you want faster iteration without full training cycles.

Each linked page dives into the practical steps that matter to you as a user: required inputs, exact CLI commands, what the prompts look like, and how to interpret the streaming output.

Get Started

Train Your Model

Training Configs

Prompt Learning

Supervised Fine Tuning

Reinforcement Learning

SDK Reference

Train Models with Synth AI

When to Use Each Flow

Common Prerequisites

Typical Workflow

Choosing a Flow

Get Started

Train Your Model

Training Configs

Prompt Learning

Supervised Fine Tuning

Reinforcement Learning

SDK Reference

​When to Use Each Flow

​Common Prerequisites

​Typical Workflow

​Choosing a Flow

When to Use Each Flow

Common Prerequisites

Typical Workflow

Choosing a Flow