On-Policy RL Demo

This example mirrors the workflow you will use in production once your task app is ready.

Prerequisites

uvx synth-ai setup has been run and your .env contains required keys.
The task app you want to train against is registered (@register_task_app) and ready to deploy.
Modal CLI is installed and logged in (modal token new).

1. Deploy the task app

uvx synth-ai deploy your-task-app --name your-modal-app

The CLI bundles your code, encrypts ENVIRONMENT_API_KEY, and prints the hosted URL. Store it in .env as TASK_APP_URL so future commands find it automatically.

uvx synth-ai modal-serve your-task-app --env-file path/to/.env

Use this to confirm secrets and tracing settings before redeploying to production.

3. Verify wiring

Run the built-in verifications before launching RL:

uvx synth-ai train \
  --type rl \
  --config path/to/rl_config.toml \
  --dry-run

--dry-run prints the payload and runs all checks except job submission: .env resolution, /rl/verify_task_app, /health, and /task_info. Fix any issues before removing --dry-run.

4. Launch the RL job

uvx synth-ai train \
  --type rl \
  --config path/to/rl_config.toml

Watch the statuses stream in your terminal or open the Synth dashboard for richer charts. The CLI prints the resulting job ID and checkpoint identifiers.

5. Inspect results & iterate

Inspect checkpoints and logs; adjust reward shaping and hyperparameters
Use --idempotency in automation to avoid duplicate job submissions

Tips

Keep environment-agnostic configs under version control; the train command embeds them into job payloads for reproducibility.
Use --idempotency if you automate submissions and want the backend to reject accidental duplicates.

This hosted workflow removes the need to manage trainers, GPU pools, or rollout schedulers manually—focus on your task app and reward shaping while Synth handles the rest.

SDK

​Prerequisites

​1. Deploy the task app

​2. Smoke-test with modal-serve (optional)

​3. Verify wiring

​4. Launch the RL job

​5. Inspect results & iterate

​Tips