Skip to main content
Synth gives you a hosted reinforcement-learning stack: you define a task app, point the CLI at it, and Synth handles rollout orchestration, training jobs, and observability. Run setup once per project directory:
uvx synth-ai setup
Setup opens a browser window, links the CLI to your authenticated Synth session, and writes two secrets into the local .env file:
  • SYNTH_API_KEY – authorizes calls to Synth’s backend (training, datasets, tracing).
  • ENVIRONMENT_API_KEY – grants the backend access to your hosted task apps.
Secrets are written to disk only; they never appear in your terminal output. If you organize work across multiple folders, rerun setup in each one so the CLI discovers the right .env automatically.

3. Bootstrap the hosted demo

The SDK ships a “Crafter” task app and RL workflow that you can deploy immediately. From any empty directory:
uvx synth-ai rl_demo init --dest crafter-demo
cd crafter-demo
The template includes a Modal-ready task app, TOML configs, and helper scripts. You now own a copy that you can edit freely.

4. Deploy the task app (Modal)

uvx synth-ai demo deploy --name crafter-demo-app
This command packages the task app, uploads secrets, and spins up the service in Modal. During deployment the CLI:
  1. Encrypts ENVIRONMENT_API_KEY so Synth’s backend can call your app securely.
  2. Builds a Modal image that bundles the Crafter dependencies.
  3. Prints the hosted URL and stores it in your .env for later commands.

5. Launch the curated RL run

uvx synth-ai demo run \
  --batch-size 4 \
  --group-size 16 \
  --model Qwen/Qwen3-0.6B
demo run submits an RL job using the deployed task app and polls until it finishes. While the job runs you will see:
  • Verification of your hosted task app via Synth’s backend (/rl/verify_task_app).
  • Live status updates as rollouts complete and metrics stream back.
  • The final checkpoint identifier you can continue fine-tuning or evaluating.

6. Inspect results

Open the Synth dashboard to explore traces, reward summaries, and checkpoints produced by the demo run. Because everything is hosted, teammates can review the same data without reproducing the environment locally.

7. Take it further

  • Customize the task app: wrap your own environment with the FastAPI harness described in Configure Your Task App, then redeploy with uvx synth-ai deploy.
  • Manage datasets: describe evaluation seeds and upload supervised JSONL files using the patterns in Configure Your Task Datasets.
  • Submit RL and SFT jobs directly: use uvx synth-ai train --type rl or --type sft with your TOML configs as documented in About Reinforcement Learning.
  • Automate rollouts: the CLI Commands section breaks down advanced flags such as --dry-run, --idempotency, and Modal deploy options.
With the demo deployed and a first RL job completed, you have the blueprint for your own task app: implement the same interfaces, redeploy, and keep iterating with Synth’s hosted training stack.
I