examples/rl/.
1. Initialise the demo
From an empty directory:.env stub.
2. Pair the CLI
Inside the demo directory:SYNTH_API_KEY and ENVIRONMENT_API_KEY into the local .env. Re-run it whenever you clone the demo elsewhere.
3. Serve locally (optional)
Run the task app with uvicorn for local debugging:TASKAPP_TRACING_ENABLED, creates the trace directory, and prints follow-up commands (run_local_rollout_traced.py) so you can capture trajectories immediately.
4. Deploy to Modal
.env (TASK_APP_BASE_URL).
5. Smoke-test
--url for your Modal endpoint when testing the hosted deployment.
6. Evaluate the baseline
task_app_url (or export TASK_APP_URL). The script hits /task_info, executes deterministic seeds, and prints accuracy plus failure categories. Swap the TOML for configs/eval_rl_qwen.toml or update the model field once you have an RL checkpoint. Add --use-rollout if you want to exercise the task app’s rollout endpoint instead of direct step calls.
7. Launch the RL job
.env entries, calls /rl/verify_task_app, and streams metrics until completion. --dry-run is deprecated—run the real command to exercise verification.
8. Inspect the run
rl_job_123 to the job ID printed by the trainer. Use --follow on status jobs logs to tail events live.
Tips
- The demo’s 
README.mdsummarises additional helper scripts (run_eval.py,run_rl_and_save.py, etc.). - Capture Modal deployment URLs from the CLI output—
TASK_APP_BASE_URLis written to.envautomatically when the deploy succeeds. - Keep the local uvicorn server and 
smokecommand in separate shells for faster iteration while editing prompts or reward logic.