Crafter On-Policy Loop

This demo mirrors the hosted Crafter workflow. It assumes you have cloned the SDK repo (for task app code and configs) or spun up the Crafter demo via uvx synth-ai demo.

Prerequisites

uvx synth-ai setup has been run in the current directory.
Modal CLI installed and authenticated (modal token new), unless you are staying on local uvicorn.
Task app registered (the demo registers grpo-crafter-demo automatically).

1. Deploy the task app

uvx synth-ai deploy \
  --runtime modal \
  --task-app task_app.py \
  --modal-app modal_app.py \
  --name crafter-prod \
  --env-file .env

The CLI encrypts ENVIRONMENT_API_KEY, builds a Modal image with your code, and stores the resulting TASK_APP_BASE_URL in .env. For local testing swap --runtime modal with --runtime local.

2. Run smoke tests

uvx synth-ai smoke \
  --config configs/crafter_smoke.toml \
  --env-file .env

This uses the same env resolution as the trainer and verifies that your task app can serve rollouts, respond with proper metadata, and log traces.

3. Launch the RL job

uvx synth-ai train \
  --config configs/rl_from_base_qwen4b.toml \
  --env-file .env

Key points:

--dry-run is deprecated. Run the command for real; the trainer will perform /rl/verify_task_app, /health, and /task_info checks before submitting work.
The CLI streams job events until completion. Press Ctrl+C if you prefer to monitor via synth-ai status jobs … later.

4. Monitor jobs

uvx synth-ai status jobs list --status running --limit 5
uvx synth-ai status jobs logs rl_job_123 --follow

Use the status suite to tail metrics and inspect timelines.

5. Iterate

Adjust rewards and hyperparameters in configs/rl_from_base_qwen4b.toml.
Reference the latest checkpoint in [model].source once you have a good run.
Combine with the Rejection Loop to feed curated traces into SFT jobs.

Get Started

Fine-Tuning

Reinforcement Learning

CLI Commands

Crafter On-Policy Loop

Prerequisites

1. Deploy the task app

2. Run smoke tests

3. Launch the RL job

4. Monitor jobs

5. Iterate

Get Started

Fine-Tuning

Reinforcement Learning

CLI Commands

​Prerequisites

​1. Deploy the task app

​2. Run smoke tests

​3. Launch the RL job

​4. Monitor jobs

​5. Iterate

Prerequisites

1. Deploy the task app

2. Run smoke tests

3. Launch the RL job

4. Monitor jobs

5. Iterate