1. Install Demo
task_app.py (Crafter task), train_cfg.toml (config), sample JSONL data.
2. Setup Credentials
SYNTH_API_KEY and ENVIRONMENT_API_KEY. Saves to .env.
3. Prepare Data
SFT requires JSONL with chat messages:4. Train
Minimal Config
Key Parameters
| Parameter | Purpose |
|---|---|
algorithm.variety | "lora", "qlora", or "fft" |
hyperparameters.n_epochs | Training epochs |
hyperparameters.global_batch | Total batch size |
hyperparameters.learning_rate | Learning rate |
training.lora.r | LoRA rank (higher = more capacity) |
Get Results
Self-Training Loop
Combine SFT with RL for iterative improvement:- Train initial model with SFT on seed data
- Run RL to improve from environment feedback
- Filter successful RL trajectories
- SFT on combined data
- Repeat