Skip to main content
Reinforcement Learning (RL) trains directly against a live task app. The CLI validates your environment, submits the job to the Synth backend, and streams rewards/metrics so you can stop or rerun quickly.

Before You Begin

  • Config: A TOML file that sets algorithm, task URLs, model defaults, and hyperparameters.
  • Secrets: SYNTH_API_KEY, ENVIRONMENT_API_KEY, and TASK_APP_URL stored in a .env. Provide the path up front with --env-file /path/.env if you want to skip prompts.
  • Task app health: Ensure /health and /task_info endpoints respond; the CLI calls them and will abort if they fail.
  • Optional overrides:
    • --model MODEL_ID to force a specific backend model.
    • --task-url https://... to override what’s in the config for this run.
    • --idempotency some-uuid so retried submissions don’t duplicate jobs.
    • --allow-experimental (or --no-allow-experimental) to temporarily change the SDK experimental flag.

Run the CLI

uvx synth-ai train \
  --type rl \
  --config configs/rl/alpha.toml \
  --env-file .env.rl
  1. Config selection: If you omit --config, the CLI lists discovered TOMLs and remembers your last choice.
  2. Env resolution: Required keys are shown with masked values; select another .env, fetch Modal secrets, or enter values manually if something is missing.
  3. Verification calls: The CLI hits POST /rl/verify_task_app using every org credential combination to make sure the backend can talk to your task app. Failures print a full diagnostics payload so you can fix auth without guessing.
  4. Task-app health check: check_task_app_health pings the task app directly with your ENVIRONMENT_API_KEY. If it fails, no job is created—fix the task app first.
  5. Job creation: The CLI prints the payload preview and runs POST {backend}/rl/jobs. The response must include a job_id.

Live Monitoring

  • Leave --poll (default) enabled to launch the JobStreamer.
  • --stream-format cli (default) prints concise status + event updates while hiding noisy Hatchet/Modal logs.
  • --stream-format chart opens a live loss/score view that tracks gepa.transformation.mean_score.
  • --poll-timeout (seconds) and --poll-interval (seconds) control how long and how often the streamer checks in.
  • Disable polling (--no-poll) when you only need the job ID—for example, triggering runs from CI and checking status later.

What You See

  • Verification summary listing candidate credentials and status codes.
  • Task app health result (✓ Task app healthy or a detailed failure reason).
  • Payload preview plus the raw backend response (truncated to 400 chars for readability).
  • Streaming events emitted by the RL job (status transitions, environment events, metrics).
  • Final status JSON once the job reaches a terminal state.

Troubleshooting Tips

  • Authentication errors usually mean the .env lacks ENVIRONMENT_API_KEY or it’s scoped to another task app—rerun with --env-file pointing to the correct secrets file.
  • If the CLI hangs at “Verifying task app…”, the task app is likely offline; test its /health endpoint manually.
  • Use --idempotency whenever you expect to rerun commands (e.g., in scripts) to avoid duplicate jobs.