Skip to main content

TL;DR

  • Unified Graphs API for running optimized graphs and verifiers
  • Graph Gen workflows for dataset-in, graph-out training
  • Verifier graph training + scoring endpoints
  • RLM graphs for massive-context inference
  • Live monitoring dashboard for graph execution
  • Expanded model provider support
  • VLM judge support for multi-modal evaluation
  • Hosted SWE agent judges
  • Clearer end-to-end onboarding paths across paid products

Graphs API: One Inference Surface

The Graphs API is now the single entry point for running optimized graphs and built-in zero-shot graphs.

What this unlocks

  • Single endpoint: Run GraphGen and graph-evolve outputs through a unified API.
  • Consistent validation: Input schemas and non-blocking output validation surface warnings without breaking inference.
  • Unified UX: One way to run policy graphs, verifier graphs, and RLM graphs.

Graph Gen: Workflows From Datasets

Graph Gen is the dataset-in, graph-out product surface for building reliable LLM workflows.

Highlights

  • Built-in judging: Rubric, contrastive, and gold-examples modes.
  • Graph types: Policy, verifier, and RLM graphs from the same dataset format.
  • Production inference: Run optimized graphs via the Graphs API.

Verifier Graphs

Verifier graphs let you score traces with calibrated, structured rewards.

What’s new

  • Graph judge endpoint: Submit a trace and get score + reasoning + event/outcome rewards.
  • Training path: Use Graph Gen with verifier datasets to train custom judges.

RLM Graphs (Massive Context)

RLM graphs handle large contexts by materializing content and searching locally instead of stuffing prompts.

Use cases

  • Multi-document QA
  • Codebase analysis
  • Large trace evaluation

UX: End-to-End Paid Product Flows

We’ve tightened the end-to-end onboarding story across paid products:
  • Task app requirements are explicit (dataset, rubric/judge config, on-demand execution).
  • Clearer first-run paths for Graph Gen, GEPA, GSPO, SFT, and verifier training.
  • Unified language for how users set up task apps, rollouts, and inference.

Monitoring: Graph Execution

A new live monitoring dashboard provides real-time visibility into graph execution, latency, and failures.

Judges: VLM + SWE

  • VLM judge support: Multi-modal evaluation for traces with image inputs.
  • Hosted SWE agent judges: Hosted judge graphs tuned for software engineering workflows.

Provider Support

Expanded model provider coverage for graph execution and judging (see provider lists in docs).

Documentation

  • Workflows overview: /product/workflows
  • Judging in Graph Gen: /product/workflows/judging
  • RLM graphs: /product/workflows/rlm
  • Graphs overview: /sdk/graphs/overview