Skip to main content
This example mirrors the workflow you will use in production once your task app is ready.

Prerequisites

  • uvx synth-ai setup has been run and your .env contains required keys.
  • The task app you want to train against is registered (@register_task_app) and ready to deploy.
  • Modal CLI is installed and logged in (modal token new).

1. Deploy the task app

uvx synth-ai deploy your-task-app --name your-modal-app
The CLI bundles your code, encrypts ENVIRONMENT_API_KEY, and prints the hosted URL. Store it in .env as TASK_APP_URL so future commands find it automatically.

2. Smoke-test with modal-serve (optional)

uvx synth-ai modal-serve your-task-app --env-file path/to/.env
Use this to confirm secrets and tracing settings before redeploying to production.

3. Verify wiring

Run the built-in verifications before launching RL:
uvx synth-ai train \
  --type rl \
  --config path/to/rl_config.toml \
  --dry-run
--dry-run prints the payload and runs all checks except job submission: .env resolution, /rl/verify_task_app, /health, and /task_info. Fix any issues before removing --dry-run.

4. Launch the RL job

uvx synth-ai train \
  --type rl \
  --config path/to/rl_config.toml
Watch the statuses stream in your terminal or open the Synth dashboard for richer charts. The CLI prints the resulting job ID and checkpoint identifiers.

5. Inspect results & iterate

  • Inspect checkpoints and logs; adjust reward shaping and hyperparameters
  • Use --idempotency in automation to avoid duplicate job submissions

Tips

  • Keep environment-agnostic configs under version control; the train command embeds them into job payloads for reproducibility.
  • Use --idempotency if you automate submissions and want the backend to reject accidental duplicates.
This hosted workflow removes the need to manage trainers, GPU pools, or rollout schedulers manually—focus on your task app and reward shaping while Synth handles the rest.
I