Roadmap & Changelog

2025-10-28 – Terminal Training Logs

🚀 New Features

Full terminal streaming logs: Both uvx synth-ai train for SFT and RL now provide comprehensive real-time training logs directly in the terminal. Users see live status updates (QUEUED, RUNNING, etc.), detailed event logs with timestamps and sequence numbers, full metrics logging (training loss, learning rate, GPU utilization, KL divergence, rollout times), and timeline progression throughout the entire training process.

2025-10-27 – Rubrics, Hosted Judges & Qwen-VL RL

🚀 New Features

Hosted Synth judges (configurable): Rollout filtering and on-policy RL can now invoke hosted judges with per-job overrides, including rubric selection, concurrency caps, and fallback behaviour.
Rubric-aware filtering: SFT filtering pipelines accept structured rubric definitions; traces are scored and trimmed according to your criteria before export.
Qwen-VL support across SFT & RL: Qwen3-VL models can be fine-tuned and trained with RL, with built-in vision collators, LoRA projector targeting, and rollout plumbing.
Instruct-model RL guidance: Added documentation and defaults for running RL on Qwen instruct SKUs, including semaphore tuning to avoid premature episode completion.

2025-10-17 – Qwen Coder, Turso, H200 Topologies & RL Throughput

🚀 New Features

Qwen Coder models supported: Qwen Coder variants are now available across SFT and inference workflows.
SDK migrated to Turso for concurrency: Storage moved to Turso to unlock reliable concurrent writes and higher throughput in multi-process runs.
More training topologies on H200s: Added configurations for larger models with additional tensor/pipeline/data parallel layouts.
Full LoRA support for Policy Gradient: LoRA integrated end-to-end into Policy Gradient training flows.
Pipelined RL async rollouts: Improved throughput via asynchronous rollouts with importance sampling adjustments for stable updates.

2025-10-09 – LoRA, MoE & Large Model Support

🚀 New Features

Expanded Qwen catalog: Simple Training now ships SFT and inference presets for every Qwen release outside the existing qwen3-{0.6B–32B} range, giving full coverage for the remaining Qwen 1.x/2.x/2.5 checkpoints.
Large-model inference & training topologies: Added 2×, 4×, and 8× layouts across B200, H200, and H100 fleets, all MoE-ready for advanced Qwen variants in both SFT and inference workflows.
Turnkey rollout: API and UI selectors automatically surface the new Qwen SKUs so jobs can be scheduled without manual topology overrides.
LoRA-first SFT: Low-Rank Adaptation is now a first-class training mode across every new Qwen topology, providing parameter-efficient finetuning defaults out of the box.

2025-09-24 – Platform Updates

🚀 New Features

Rollout Viewer: Enhanced visualization and monitoring interface for training rollouts with real-time metrics and progress tracking
B200 & H200 GPU Support: Added support for NVIDIA’s latest flagship GPUs (B200, H200) for both training and inference workloads
Faster Inference: Optimized inference pipeline with improved throughput and reduced latency across all model sizes
GSPO Support: Integrated Group Sequence Policy Optimization (GSPO) algorithm for advanced reinforcement learning training

2025-09-17 – Online RL (customer‑visible features)

Organization‑scoped environment credentials
- Upload your environment API key once (sealed‑box encrypted). The platform decrypts and injects it at run time; plaintext is never transmitted or stored.
First‑party Task App integration
- Run environments behind a managed Task App with authenticated rollouts. Online RL calls your Task App endpoints directly during training.
Single‑node, multi‑GPU Online RL
- Out‑of‑the‑box split between vLLM inference GPUs and training GPUs on a single node (e.g., 6 inference / 2 training on H100). *Multi-node training finished in dev, reach out if interested.
- Supports reference model (for KL) stacked on inference or in its own GPU, and configurable tensor parallelism for inference.
Production run flow
- Start an Online RL job against your deployed Task App, monitor progress/events, and run inference using the produced checkpoint when training completes.

0.2.2.dev2 — Aug 8, 2025

Fine-tuning (SFT) endpoints available and documented end-to-end
Interactive demo launcher (uvx synth-ai demo) with finetuning flow for Qwen 4B
Live polling output during training with real-time status updates
CLI Reference for uvx synth-ai serve, uvx synth-ai traces, and demo launcher

0.2.2.dev1 — Aug 7, 2025

New backend balance APIs and CLI for account visibility
CLI utilities: balance, traces, and man commands
Traces inventory view with per-DB counts and storage footprint
Standardized one-off usage: uvx synth-ai <command> (removed interactive watch)
Improved .env loading and API key resolution

0.2.2.dev0 — Jul 30, 2025

Environment Registration API for custom environments
Turso/sqld daemon support with local-first replicas
Environment Service Daemon via uvx synth-ai serve

0.2.1.dev1 — Jul 29, 2025

Initial development release

Feb 3, 2025

Cuvier Error Search (deprecated)

Jan 2025

Langsmith integration for Enterprise partners
Python SDK v0.3 (simplified API, Anthropic support)

Get Started

Fine-Tuning

Reinforcement Learning

CLI Commands

Roadmap & Changelog

2025-10-28 – Terminal Training Logs

🚀 New Features

2025-10-27 – Rubrics, Hosted Judges & Qwen-VL RL

🚀 New Features

2025-10-17 – Qwen Coder, Turso, H200 Topologies & RL Throughput

🚀 New Features

2025-10-09 – LoRA, MoE & Large Model Support

🚀 New Features

2025-09-24 – Platform Updates

🚀 New Features

2025-09-17 – Online RL (customer‑visible features)

0.2.2.dev2 — Aug 8, 2025

0.2.2.dev1 — Aug 7, 2025

0.2.2.dev0 — Jul 30, 2025

0.2.1.dev1 — Jul 29, 2025

Feb 3, 2025

Jan 2025

Get Started

Fine-Tuning

Reinforcement Learning

CLI Commands

​2025-10-28 – Terminal Training Logs

​🚀 New Features

​2025-10-27 – Rubrics, Hosted Judges & Qwen-VL RL

​🚀 New Features

​2025-10-17 – Qwen Coder, Turso, H200 Topologies & RL Throughput

​🚀 New Features

​2025-10-09 – LoRA, MoE & Large Model Support

​🚀 New Features

​2025-09-24 – Platform Updates

​🚀 New Features

​2025-09-17 – Online RL (customer‑visible features)

​0.2.2.dev2 — Aug 8, 2025

​0.2.2.dev1 — Aug 7, 2025

​0.2.2.dev0 — Jul 30, 2025

​0.2.1.dev1 — Jul 29, 2025

​Feb 3, 2025

​Jan 2025

2025-10-28 – Terminal Training Logs

🚀 New Features

2025-10-27 – Rubrics, Hosted Judges & Qwen-VL RL

🚀 New Features

2025-10-17 – Qwen Coder, Turso, H200 Topologies & RL Throughput

🚀 New Features

2025-10-09 – LoRA, MoE & Large Model Support

🚀 New Features

2025-09-24 – Platform Updates

🚀 New Features

2025-09-17 – Online RL (customer‑visible features)

0.2.2.dev2 — Aug 8, 2025

0.2.2.dev1 — Aug 7, 2025

0.2.2.dev0 — Jul 30, 2025

0.2.1.dev1 — Jul 29, 2025

Feb 3, 2025

Jan 2025