Skip to main content
The math single-step demo exercises an entire RL loop: deploy the bundled task app, run smoke tests, launch an on-policy job, and inspect the results. All assets live in the SDK repo under examples/rl/.

1. Initialise the demo

From an empty directory:
uvx synth-ai demo --list
Pick Math Single-Step (Modal deployment) and materialise it into your workspace:
uvx synth-ai demo --force
This writes a self-contained project (task app, configs, helper scripts) plus a .env stub.

2. Pair the CLI

Inside the demo directory:
uvx synth-ai setup
This handshake saves SYNTH_API_KEY and ENVIRONMENT_API_KEY into the local .env. Re-run it whenever you clone the demo elsewhere.

3. Serve locally (optional)

Run the task app with uvicorn for local debugging:
uvx synth-ai deploy \
  --runtime local \
  --task-app task_app.py \
  --host 0.0.0.0 \
  --port 8101 \
  --env-file .env \
  --trace traces/v3 \
  --trace-db traces/v3/math_demo.db
When tracing is enabled the CLI wires TASKAPP_TRACING_ENABLED, creates the trace directory, and prints follow-up commands (run_local_rollout_traced.py) so you can capture trajectories immediately.

4. Deploy to Modal

uvx synth-ai deploy \
  --runtime modal \
  --task-app task_app.py \
  --modal-app modal_app.py \
  --name synth-math-demo \
  --env-file .env
The CLI verifies Modal auth, builds an image with the math environment dependencies, and persists the public URL to .env (TASK_APP_BASE_URL).

5. Smoke-test

uvx synth-ai smoke \
  --url http://localhost:8101 \
  --env-file .env \
  --policy mock \
  --max-steps 1
This auto-starts sqld (if needed), runs a handful of rollouts with the bundled mock policy, and validates that the task app emits the fields required by the trainer. Swap --url for your Modal endpoint when testing the hosted deployment.

6. Evaluate the baseline

uv run python run_eval.py \
  --toml configs/eval_base_qwen.toml
Make sure the TOML includes task_app_url (or export TASK_APP_URL). The script hits /task_info, executes deterministic seeds, and prints accuracy plus failure categories. Swap the TOML for configs/eval_rl_qwen.toml or update the model field once you have an RL checkpoint. Add --use-rollout if you want to exercise the task app’s rollout endpoint instead of direct step calls.

7. Launch the RL job

uvx synth-ai train \
  --config configs/rl_from_base_qwen.toml \
  --env-file .env
The CLI resolves .env entries, calls /rl/verify_task_app, and streams metrics until completion. --dry-run is deprecated—run the real command to exercise verification.

8. Inspect the run

uvx synth-ai status jobs get rl_job_123 --json
uvx synth-ai status jobs metrics rl_job_123
uvx synth-ai status runs list rl_job_123
Change rl_job_123 to the job ID printed by the trainer. Use --follow on status jobs logs to tail events live.

Tips

  • The demo’s README.md summarises additional helper scripts (run_eval.py, run_rl_and_save.py, etc.).
  • Capture Modal deployment URLs from the CLI output—TASK_APP_BASE_URL is written to .env automatically when the deploy succeeds.
  • Keep the local uvicorn server and smoke command in separate shells for faster iteration while editing prompts or reward logic.