Math Single-Step RL

The math single-step demo exercises an entire RL loop: deploy the bundled task app, run smoke tests, launch an on-policy job, and inspect the results. All assets live in the SDK repo under examples/rl/.

1. Initialise the demo

From an empty directory:

uvx synth-ai demo --list

Pick Math Single-Step (Modal deployment) and materialise it into your workspace:

uvx synth-ai demo --force

This writes a self-contained project (task app, configs, helper scripts) plus a .env stub.

2. Pair the CLI

Inside the demo directory:

uvx synth-ai setup

This handshake saves SYNTH_API_KEY and ENVIRONMENT_API_KEY into the local .env. Re-run it whenever you clone the demo elsewhere.

3. Serve locally (optional)

Run the task app with uvicorn for local debugging:

uvx synth-ai deploy \
  --runtime local \
  --task-app task_app.py \
  --host 0.0.0.0 \
  --port 8101 \
  --env-file .env \
  --trace traces/v3 \
  --trace-db traces/v3/math_demo.db

When tracing is enabled the CLI wires TASKAPP_TRACING_ENABLED, creates the trace directory, and prints follow-up commands (run_local_rollout_traced.py) so you can capture trajectories immediately.

uvx synth-ai deploy \
  --runtime modal \
  --task-app task_app.py \
  --modal-app modal_app.py \
  --name synth-math-demo \
  --env-file .env

The CLI verifies Modal auth, builds an image with the math environment dependencies, and persists the public URL to .env (TASK_APP_BASE_URL).

5. Smoke-test

uvx synth-ai smoke \
  --url http://localhost:8101 \
  --env-file .env \
  --policy mock \
  --max-steps 1

This auto-starts sqld (if needed), runs a handful of rollouts with the bundled mock policy, and validates that the task app emits the fields required by the trainer. Swap --url for your Modal endpoint when testing the hosted deployment.

6. Evaluate the baseline

uv run python run_eval.py \
  --toml configs/eval_base_qwen.toml

Make sure the TOML includes task_app_url (or export TASK_APP_URL). The script hits /task_info, executes deterministic seeds, and prints accuracy plus failure categories. Swap the TOML for configs/eval_rl_qwen.toml or update the model field once you have an RL checkpoint. Add --use-rollout if you want to exercise the task app’s rollout endpoint instead of direct step calls.

7. Launch the RL job

uvx synth-ai train \
  --config configs/rl_from_base_qwen.toml \
  --env-file .env

The CLI resolves .env entries, calls /rl/verify_task_app, and streams metrics until completion. --dry-run is deprecated—run the real command to exercise verification.

8. Inspect the run

uvx synth-ai status jobs get rl_job_123 --json
uvx synth-ai status jobs metrics rl_job_123
uvx synth-ai status runs list rl_job_123

Change rl_job_123 to the job ID printed by the trainer. Use --follow on status jobs logs to tail events live.

Tips

The demo’s README.md summarises additional helper scripts (run_eval.py, run_rl_and_save.py, etc.).
Capture Modal deployment URLs from the CLI output—TASK_APP_BASE_URL is written to .env automatically when the deploy succeeds.
Keep the local uvicorn server and smoke command in separate shells for faster iteration while editing prompts or reward logic.

Get Started

Fine-Tuning

Reinforcement Learning

CLI Commands

Math Single-Step RL

1. Initialise the demo

2. Pair the CLI

3. Serve locally (optional)

5. Smoke-test

6. Evaluate the baseline

7. Launch the RL job

8. Inspect the run

Tips

Get Started

Fine-Tuning

Reinforcement Learning

CLI Commands

​1. Initialise the demo

​2. Pair the CLI

​3. Serve locally (optional)

​4. Deploy to Modal

​5. Smoke-test

​6. Evaluate the baseline

​7. Launch the RL job

​8. Inspect the run

​Tips

1. Initialise the demo

2. Pair the CLI

3. Serve locally (optional)

4. Deploy to Modal

5. Smoke-test

6. Evaluate the baseline

7. Launch the RL job

8. Inspect the run

Tips