synth_ai.sdk.api.train.graphgen
Alpha
First-class SDK API for GraphGen (Automated Design of Agentic Systems).
GraphGen is a simplified “Workflows API” for prompt optimization that:
- Uses a simple JSON dataset format (GraphGenTaskSet) instead of TOML configs
- Auto-generates task apps from the dataset (no user-managed task apps)
- Has built-in judge configurations (rubric, contrastive, gold_examples)
- Wraps GEPA internally for the actual optimization
synth_ai.sdk.api.train.graphgen_models
GraphGen (Automated Design of Agentic Systems) data models.
This module provides Pydantic models for defining GraphGen datasets and job configurations.
GraphGen is a simplified “Workflows API” for prompt optimization that wraps GEPA with
auto-generated task apps and built-in judge configurations.
Example:
Functions
parse_graphgen_taskset
data: Dictionary containing the taskset data (from JSON)
- Validated GraphGenTaskSet
ValueError: If validation fails
load_graphgen_taskset
path: Path to JSON file
- Validated GraphGenTaskSet
FileNotFoundError: If file doesn’t existValueError: If validation fails
Job API
GraphGenJobResult
Result from an GraphGen job.
GraphGenSubmitResult
Result from submitting an GraphGen job.
GraphGenJob
High-level SDK class for running GraphGen workflow optimization jobs.
GraphGen (Automated Design of Agentic Systems) provides a simplified API for
graph/workflow optimization that doesn’t require users to manage task apps.
Key differences from PromptLearningJob:
- Uses JSON dataset format (GraphGenTaskSet) instead of TOML configs
- No task app management required - GraphGen builds it internally
- Built-in judge modes (rubric, contrastive, gold_examples)
- Graph-first: trains multi-node workflows by default (Graph-GEPA)
- Public graph downloads are redacted
.txtexports only - Simpler configuration with sensible defaults
from_dataset
dataset: Dataset as file path, dict, or GraphGenTaskSet objectgraph_type: Type of graph to train:- “policy”: Maps inputs to outputs (default).
- “verifier”: Judges/scores traces (requires verifier-compliant dataset).
- “rlm”: Recursive Language Model - handles massive contexts via tool-based search and recursive LLM calls. Requires configured_tools parameter.
policy_model: Model to use for policy inferencerollout_budget: Total number of rollouts for optimizationproposer_effort: Proposer effort level (“medium” or “high”). “low” is not allowed as gpt-4.1-mini is too weak for graph generation.judge_model: Override judge model from datasetjudge_provider: Override judge provider from datasetpopulation_size: Population size for GEPAnum_generations: Number of generations (auto-calculated if not specified)problem_spec: Detailed problem specification for the graph proposer. Include domain-specific info like valid output labels for classification.target_llm_calls: Target number of LLM calls for the graph (1-10). Controls how many LLM nodes the graph should use. Defaults to 5.configured_tools: Optional list of tool bindings for RLM graphs. Required for graph_type=“rlm”. Each tool should be a dict with ‘name’, ‘kind’, and ‘stateful’. Example: [{‘name’: ‘materialize_context’, ‘kind’: ‘rlm_materialize’, ‘stateful’: True}]backend_url: Backend API URL (defaults to env or production)api_key: API key (defaults to SYNTH_API_KEY env var)auto_start: Whether to start the job immediatelymetadata: Additional metadata for the job
- GraphGenJob instance
from_job_id
job_id: GraphGen job ID (“graphgen_”) or underlying GEPA job ID (“pl_”)backend_url: Backend API URL (defaults to env or production)api_key: API key (defaults to SYNTH_API_KEY env var)
- GraphGenJob instance for the existing job
from_graph_evolve_job_id
job_id
graph_evolve_job_id
submit
- GraphGenSubmitResult with job IDs and initial status
RuntimeError: If job submission fails
get_status
- Job status dictionary containing ‘status’, ‘best_score’, etc.
RuntimeError: If job hasn’t been submitted yet or API call fails.
start
- Updated job status dictionary.
get_events
since_seq: Return events with sequence number greater than this.limit: Maximum number of events to return.
- Backend envelope: {“events”: […], “has_more”: bool, “next_seq”: int}.
get_metrics
name: Optional metric name filter.after_step: Optional step filter.limit: Maximum number of metrics to return.run_id: Optional run identifier filter.
- Dictionary containing ‘metrics’ list.
stream_until_complete
- job_started: Job execution began
- generation_started: New generation of candidates started
- candidate_evaluated: A candidate graph was evaluated
- generation_completed: Generation finished
- optimization_completed: Job finished successfully
- job_failed: Job encountered an error
timeout: Maximum seconds to wait for completioninterval: Seconds between status checks (for SSE reconnects)handlers: Optional StreamHandler instances for custom event handling. Defaults to GraphGenHandler which provides formatted CLI output.on_event: Optional callback function called on each event. Receives the event dict as argument.
- Final job status dictionary containing ‘status’, ‘best_score’, etc.
RuntimeError: If job hasn’t been submitted yetTimeoutError: If timeout exceeded before job completion
download_prompt
download_graph_txt(); this method is
mainly useful for legacy single-node prompt workflows.
Returns:
- Optimized prompt text
RuntimeError: If job hasn’t been submitted or isn’t complete
download_graph_txt
.txt export from:
GET /api/graphgen/jobs/{job_id}/graph.txt
run_inference
input_data: Input data matching the task formatmodel: Override model (default: use job’s policy model)prompt_snapshot_id: Legacy alias for selecting a specific snapshot.graph_snapshot_id: Specific GraphSnapshot to use (default: best). Preferred for graph-first jobs. If provided, it is sent asprompt_snapshot_idfor backward-compatible backend routing.
- Output dictionary containing ‘output’, ‘usage’, etc.
RuntimeError: If job hasn’t been submitted or inference fails.ValueError: If both prompt_snapshot_id and graph_snapshot_id are provided.
run_inference_output
run_verifier
session_trace: V3 session trace to evaluate. Can be a dict or SessionTraceInput.context: Additional context for evaluation (e.g., rubric overrides, task description).prompt_snapshot_id: Specific snapshot to use (default: best).graph_snapshot_id: Specific GraphSnapshot to use (default: best). Preferred for graph-first jobs.
- GraphGenGraphJudgeResponse containing structured rewards and reasoning.
RuntimeError: If job hasn’t been submitted or inference fails.
run_judge
get_graph_record
download_graph_txt() for the
public export.
Args:
prompt_snapshot_id: Legacy alias for selecting a specific snapshot.graph_snapshot_id: Specific GraphSnapshot to use (default: best).
- Graph record dictionary containing:
-
- job_id: The job ID
-
- snapshot_id: The snapshot ID used
-
- prompt: Extracted prompt text (legacy single-node only; may be empty)
-
- graph: Public graph record payload (e.g., export metadata)
-
- model: Model used for this graph (optional)
RuntimeError: If job hasn’t been submitted or API call fails.ValueError: If both prompt_snapshot_id and graph_snapshot_id are provided.
Configuration Reference
OutputConfig
Configuration for graph output extraction + validation.
This model defines how graph outputs should be extracted and validated.
It supports JSON Schema validation, multiple output formats, and
configurable extraction paths.
Attributes:
schema_: JSON Schema (draft-07) for output validation. Use alias “schema” in JSON.format: Expected output format - “json”, “text”, “tool_calls”, or “image”.strict: If True, validation failures fail the run; if False, log warnings and continue.extract_from: Ordered list of dot-paths/keys to try when extracting output from final_state.
GraphGenTaskSetMetadata
Metadata about the dataset.
Methods:
validate_select_output
validate_output_config
GraphGenRubricCriterion
A single rubric criterion for evaluation.
GraphGenRubricOutcome
Outcome-level rubric (evaluates final output).
GraphGenRubricEvents
Event-level rubric (evaluates intermediate steps).
GraphGenRubric
Rubric for evaluating task outputs.
GraphGenTask
A single task in the dataset.
Tasks have arbitrary JSON inputs and optional task-specific rubrics.
Gold outputs are stored separately and linked via task_id.
GraphGenGoldOutput
A gold/reference output.
Can be linked to a specific task via task_id, or standalone (for reference examples).
Standalone gold outputs (no task_id) are used as reference pool for contrastive judging.
GraphGenJudgeConfig
Configuration for the judge used during optimization.
GraphGenTaskSet
The complete GraphGen dataset format.
Contains tasks with arbitrary JSON inputs, gold outputs (optionally linked to tasks),
rubrics (task-specific and/or default), and judge configuration.
Methods:
validate_unique_task_ids
validate_gold_output_task_ids
v: The list of gold outputs being validated.info: Pydantic ValidationInfo providing access to other fields via info.data.
- The validated list of gold outputs.
ValueError: If a gold output references a non-existent task ID.
validate_select_output
validate_output_config
get_task_by_id
get_task_by_index
index: Zero-based index into tasks list (0 to len(tasks)-1).
- Task at the specified index, or None if index is out of range.