Skip to main content

2025-10-28 – Terminal Training Logs

🚀 New Features

  • Full terminal streaming logs: Both uvx synth-ai train for SFT and RL now provide comprehensive real-time training logs directly in the terminal. Users see live status updates (QUEUED, RUNNING, etc.), detailed event logs with timestamps and sequence numbers, full metrics logging (training loss, learning rate, GPU utilization, KL divergence, rollout times), and timeline progression throughout the entire training process.

2025-10-27 – Rubrics, Hosted Judges & Qwen-VL RL

🚀 New Features

  • Hosted Synth judges (configurable): Rollout filtering and on-policy RL can now invoke hosted judges with per-job overrides, including rubric selection, concurrency caps, and fallback behaviour.
  • Rubric-aware filtering: SFT filtering pipelines accept structured rubric definitions; traces are scored and trimmed according to your criteria before export.
  • Qwen-VL support across SFT & RL: Qwen3-VL models can be fine-tuned and trained with RL, with built-in vision collators, LoRA projector targeting, and rollout plumbing.
  • Instruct-model RL guidance: Added documentation and defaults for running RL on Qwen instruct SKUs, including semaphore tuning to avoid premature episode completion.

2025-10-17 – Qwen Coder, Turso, H200 Topologies & RL Throughput

🚀 New Features

  • Qwen Coder models supported: Qwen Coder variants are now available across SFT and inference workflows.
  • SDK migrated to Turso for concurrency: Storage moved to Turso to unlock reliable concurrent writes and higher throughput in multi-process runs.
  • More training topologies on H200s: Added configurations for larger models with additional tensor/pipeline/data parallel layouts.
  • Full LoRA support for Policy Gradient: LoRA integrated end-to-end into Policy Gradient training flows.
  • Pipelined RL async rollouts: Improved throughput via asynchronous rollouts with importance sampling adjustments for stable updates.

2025-10-09 – LoRA, MoE & Large Model Support

🚀 New Features

  • Expanded Qwen catalog: Simple Training now ships SFT and inference presets for every Qwen release outside the existing qwen3-{0.6B–32B} range, giving full coverage for the remaining Qwen 1.x/2.x/2.5 checkpoints.
  • Large-model inference & training topologies: Added 2×, 4×, and 8× layouts across B200, H200, and H100 fleets, all MoE-ready for advanced Qwen variants in both SFT and inference workflows.
  • Turnkey rollout: API and UI selectors automatically surface the new Qwen SKUs so jobs can be scheduled without manual topology overrides.
  • LoRA-first SFT: Low-Rank Adaptation is now a first-class training mode across every new Qwen topology, providing parameter-efficient finetuning defaults out of the box.

2025-09-24 – Platform Updates

🚀 New Features

  • Rollout Viewer: Enhanced visualization and monitoring interface for training rollouts with real-time metrics and progress tracking
  • B200 & H200 GPU Support: Added support for NVIDIA’s latest flagship GPUs (B200, H200) for both training and inference workloads
  • Faster Inference: Optimized inference pipeline with improved throughput and reduced latency across all model sizes
  • GSPO Support: Integrated Group Sequence Policy Optimization (GSPO) algorithm for advanced reinforcement learning training

2025-09-17 – Online RL (customer‑visible features)

  • Organization‑scoped environment credentials
    • Upload your environment API key once (sealed‑box encrypted). The platform decrypts and injects it at run time; plaintext is never transmitted or stored.
  • First‑party Task App integration
    • Run environments behind a managed Task App with authenticated rollouts. Online RL calls your Task App endpoints directly during training.
  • Single‑node, multi‑GPU Online RL
    • Out‑of‑the‑box split between vLLM inference GPUs and training GPUs on a single node (e.g., 6 inference / 2 training on H100). *Multi-node training finished in dev, reach out if interested.
    • Supports reference model (for KL) stacked on inference or in its own GPU, and configurable tensor parallelism for inference.
  • Production run flow
    • Start an Online RL job against your deployed Task App, monitor progress/events, and run inference using the produced checkpoint when training completes.

0.2.2.dev2 — Aug 8, 2025

  • Fine-tuning (SFT) endpoints available and documented end-to-end
  • Interactive demo launcher (uvx synth-ai demo) with finetuning flow for Qwen 4B
  • Live polling output during training with real-time status updates
  • CLI Reference for uvx synth-ai serve, uvx synth-ai traces, and demo launcher

0.2.2.dev1 — Aug 7, 2025

  • New backend balance APIs and CLI for account visibility
  • CLI utilities: balance, traces, and man commands
  • Traces inventory view with per-DB counts and storage footprint
  • Standardized one-off usage: uvx synth-ai <command> (removed interactive watch)
  • Improved .env loading and API key resolution

0.2.2.dev0 — Jul 30, 2025

  • Environment Registration API for custom environments
  • Turso/sqld daemon support with local-first replicas
  • Environment Service Daemon via uvx synth-ai serve

0.2.1.dev1 — Jul 29, 2025

  • Initial development release

Feb 3, 2025

  • Cuvier Error Search (deprecated)

Jan 2025

  • Langsmith integration for Enterprise partners
  • Python SDK v0.3 (simplified API, Anthropic support)