TL;DR
- Unified Graphs API for running optimized graphs and verifiers
- Graph Gen workflows for dataset-in, graph-out training
- Verifier graph training + scoring endpoints
- RLM graphs for massive-context inference
- Live monitoring dashboard for graph execution
- Expanded model provider support
- VLM judge support for multi-modal evaluation
- Hosted SWE agent judges
- Clearer end-to-end onboarding paths across paid products
Graphs API: One Inference Surface
The Graphs API is now the single entry point for running optimized graphs and built-in zero-shot graphs.What this unlocks
- Single endpoint: Run GraphGen and graph-evolve outputs through a unified API.
- Consistent validation: Input schemas and non-blocking output validation surface warnings without breaking inference.
- Unified UX: One way to run policy graphs, verifier graphs, and RLM graphs.
Graph Gen: Workflows From Datasets
Graph Gen is the dataset-in, graph-out product surface for building reliable LLM workflows.Highlights
- Built-in judging: Rubric, contrastive, and gold-examples modes.
- Graph types: Policy, verifier, and RLM graphs from the same dataset format.
- Production inference: Run optimized graphs via the Graphs API.
Verifier Graphs
Verifier graphs let you score traces with calibrated, structured rewards.What’s new
- Graph judge endpoint: Submit a trace and get score + reasoning + event/outcome rewards.
- Training path: Use Graph Gen with verifier datasets to train custom judges.
RLM Graphs (Massive Context)
RLM graphs handle large contexts by materializing content and searching locally instead of stuffing prompts.Use cases
- Multi-document QA
- Codebase analysis
- Large trace evaluation
UX: End-to-End Paid Product Flows
We’ve tightened the end-to-end onboarding story across paid products:- Task app requirements are explicit (dataset, rubric/judge config, on-demand execution).
- Clearer first-run paths for Graph Gen, GEPA, GSPO, SFT, and verifier training.
- Unified language for how users set up task apps, rollouts, and inference.
Monitoring: Graph Execution
A new live monitoring dashboard provides real-time visibility into graph execution, latency, and failures.Judges: VLM + SWE
- VLM judge support: Multi-modal evaluation for traces with image inputs.
- Hosted SWE agent judges: Hosted judge graphs tuned for software engineering workflows.
Provider Support
Expanded model provider coverage for graph execution and judging (see provider lists in docs).Documentation
- Workflows overview:
/product/workflows - Judging in Graph Gen:
/product/workflows/judging - RLM graphs:
/product/workflows/rlm - Graphs overview:
/sdk/graphs/overview