Train Your Model with Reinforcement Learning

Reinforcement Learning (RL) trains directly against a live task app. The CLI validates your environment, submits the job to the Synth backend, and streams rewards/metrics so you can stop or rerun quickly.

Before You Begin

Config: A TOML file that sets algorithm, task URLs, model defaults, and hyperparameters.
Secrets: SYNTH_API_KEY, ENVIRONMENT_API_KEY, and TASK_APP_URL stored in a .env. Provide the path up front with --env-file /path/.env if you want to skip prompts.
Task app health: Ensure /health and /task_info endpoints respond; the CLI calls them and will abort if they fail.
Optional overrides:
- --model MODEL_ID to force a specific backend model.
- --task-url https://... to override what’s in the config for this run.
- --idempotency some-uuid so retried submissions don’t duplicate jobs.
- --allow-experimental (or --no-allow-experimental) to temporarily change the SDK experimental flag.

Run the CLI

uvx synth-ai train \
  --type rl \
  --config configs/rl/alpha.toml \
  --env-file .env.rl

Config selection: If you omit --config, the CLI lists discovered TOMLs and remembers your last choice.
Env resolution: Required keys are shown with masked values; select another .env, fetch Modal secrets, or enter values manually if something is missing.
Verification calls: The CLI hits POST /rl/verify_task_app using every org credential combination to make sure the backend can talk to your task app. Failures print a full diagnostics payload so you can fix auth without guessing.
Task-app health check: check_task_app_health pings the task app directly with your ENVIRONMENT_API_KEY. If it fails, no job is created—fix the task app first.
Job creation: The CLI prints the payload preview and runs POST {backend}/rl/jobs. The response must include a job_id.

Live Monitoring

Leave --poll (default) enabled to launch the JobStreamer.
--stream-format cli (default) prints concise status + event updates while hiding noisy Hatchet/Modal logs.
--stream-format chart opens a live loss/score view that tracks gepa.transformation.mean_score.
--poll-timeout (seconds) and --poll-interval (seconds) control how long and how often the streamer checks in.
Disable polling (--no-poll) when you only need the job ID—for example, triggering runs from CI and checking status later.

What You See

Verification summary listing candidate credentials and status codes.
Task app health result (✓ Task app healthy or a detailed failure reason).
Payload preview plus the raw backend response (truncated to 400 chars for readability).
Streaming events emitted by the RL job (status transitions, environment events, metrics).
Final status JSON once the job reaches a terminal state.

Troubleshooting Tips

Authentication errors usually mean the .env lacks ENVIRONMENT_API_KEY or it’s scoped to another task app—rerun with --env-file pointing to the correct secrets file.
If the CLI hangs at “Verifying task app…”, the task app is likely offline; test its /health endpoint manually.
Use --idempotency whenever you expect to rerun commands (e.g., in scripts) to avoid duplicate jobs.

Get Started

Train Your Model

Training Configs

Prompt Learning

Supervised Fine Tuning

Reinforcement Learning

SDK Reference

Train Your Model with Reinforcement Learning

Before You Begin

Run the CLI

Live Monitoring

What You See

Troubleshooting Tips

Get Started

Train Your Model

Training Configs

Prompt Learning

Supervised Fine Tuning

Reinforcement Learning

SDK Reference

​Before You Begin

​Run the CLI

​Live Monitoring

​What You See

​Troubleshooting Tips

Before You Begin

Run the CLI

Live Monitoring

What You See

Troubleshooting Tips