Skip to main content

synth_ai.sdk.api.train.graphgen

Alpha First-class SDK API for GraphGen (Automated Design of Agentic Systems). GraphGen is a simplified “Workflows API” for prompt optimization that:
  • Uses a simple JSON dataset format (GraphGenTaskSet) instead of TOML configs
  • Auto-generates task apps from the dataset (no user-managed task apps)
  • Has built-in judge configurations (rubric, contrastive, gold_examples)
  • Wraps GEPA internally for the actual optimization
Example CLI usage:
uvx synth-ai train --type graphgen --dataset my_tasks.json --poll
Example SDK usage:
from synth_ai.sdk.api.train.graphgen import GraphGenJob
from synth_ai.sdk.api.train.graphgen_models import GraphGenTaskSet, GraphGenTask

# From a dataset file
job = GraphGenJob.from_dataset("my_tasks.json")
job.submit()
result = job.stream_until_complete()
print(f"Best score: {result.get('best_score')}")

# Or programmatically
dataset = GraphGenTaskSet(
    metadata=GraphGenTaskSetMetadata(name="My Tasks"),
    tasks=[GraphGenTask(id="t1", input={"question": "What is 2+2?"})],
    gold_outputs=[GraphGenGoldOutput(output={"answer": "4"}, task_id="t1")],
)
job = GraphGenJob.from_dataset(dataset, policy_model="gpt-4o-mini", problem_spec="You are a helpful assistant.")
job.submit()

synth_ai.sdk.api.train.graphgen_models

GraphGen (Automated Design of Agentic Systems) data models. This module provides Pydantic models for defining GraphGen datasets and job configurations. GraphGen is a simplified “Workflows API” for prompt optimization that wraps GEPA with auto-generated task apps and built-in judge configurations. Example:
from synth_ai.sdk.api.train.graphgen_models import (
    GraphGenTaskSet,
    GraphGenTask,
    GraphGenGoldOutput,
    GraphGenRubric,
    GraphGenJobConfig,
)

# Create a dataset
dataset = GraphGenTaskSet(
    metadata=GraphGenTaskSetMetadata(name="My Dataset"),
    tasks=[
        GraphGenTask(id="task1", input={"question": "What is 2+2?"}),
        GraphGenTask(id="task2", input={"question": "What is the capital of France?"}),
    ],
    gold_outputs=[
        GraphGenGoldOutput(output={"answer": "4"}, task_id="task1"),
        GraphGenGoldOutput(output={"answer": "Paris"}, task_id="task2"),
    ],
    judge_config=GraphGenJudgeConfig(mode="rubric"),
)

Functions

parse_graphgen_taskset

parse_graphgen_taskset(data: Dict[str, Any]) -> GraphGenTaskSet
Parse a dictionary into an GraphGenTaskSet. Args:
  • data: Dictionary containing the taskset data (from JSON)
Returns:
  • Validated GraphGenTaskSet
Raises:
  • ValueError: If validation fails

load_graphgen_taskset

load_graphgen_taskset(path: str | Path) -> GraphGenTaskSet
Load an GraphGenTaskSet from a JSON file. Args:
  • path: Path to JSON file
Returns:
  • Validated GraphGenTaskSet
Raises:
  • FileNotFoundError: If file doesn’t exist
  • ValueError: If validation fails

Job API

GraphGenJobResult

Result from an GraphGen job.

GraphGenSubmitResult

Result from submitting an GraphGen job.

GraphGenJob

High-level SDK class for running GraphGen workflow optimization jobs. GraphGen (Automated Design of Agentic Systems) provides a simplified API for graph/workflow optimization that doesn’t require users to manage task apps. Key differences from PromptLearningJob:
  • Uses JSON dataset format (GraphGenTaskSet) instead of TOML configs
  • No task app management required - GraphGen builds it internally
  • Built-in judge modes (rubric, contrastive, gold_examples)
  • Graph-first: trains multi-node workflows by default (Graph-GEPA)
  • Public graph downloads are redacted .txt exports only
  • Simpler configuration with sensible defaults
Methods:

from_dataset

from_dataset(cls, dataset: str | Path | Dict[str, Any] | GraphGenTaskSet) -> GraphGenJob
Create an GraphGen job from a dataset. Args:
  • dataset: Dataset as file path, dict, or GraphGenTaskSet object
  • graph_type: Type of graph to train:
  • “policy”: Maps inputs to outputs (default).
  • “verifier”: Judges/scores traces (requires verifier-compliant dataset).
  • “rlm”: Recursive Language Model - handles massive contexts via tool-based search and recursive LLM calls. Requires configured_tools parameter.
  • policy_model: Model to use for policy inference
  • rollout_budget: Total number of rollouts for optimization
  • proposer_effort: Proposer effort level (“medium” or “high”). “low” is not allowed as gpt-4.1-mini is too weak for graph generation.
  • judge_model: Override judge model from dataset
  • judge_provider: Override judge provider from dataset
  • population_size: Population size for GEPA
  • num_generations: Number of generations (auto-calculated if not specified)
  • problem_spec: Detailed problem specification for the graph proposer. Include domain-specific info like valid output labels for classification.
  • target_llm_calls: Target number of LLM calls for the graph (1-10). Controls how many LLM nodes the graph should use. Defaults to 5.
  • configured_tools: Optional list of tool bindings for RLM graphs. Required for graph_type=“rlm”. Each tool should be a dict with ‘name’, ‘kind’, and ‘stateful’. Example: [{‘name’: ‘materialize_context’, ‘kind’: ‘rlm_materialize’, ‘stateful’: True}]
  • backend_url: Backend API URL (defaults to env or production)
  • api_key: API key (defaults to SYNTH_API_KEY env var)
  • auto_start: Whether to start the job immediately
  • metadata: Additional metadata for the job
Returns:
  • GraphGenJob instance

from_job_id

from_job_id(cls, job_id: str, backend_url: Optional[str] = None, api_key: Optional[str] = None) -> GraphGenJob
Resume an existing GraphGen job by ID. Args:
  • job_id: GraphGen job ID (“graphgen_”) or underlying GEPA job ID (“pl_”)
  • backend_url: Backend API URL (defaults to env or production)
  • api_key: API key (defaults to SYNTH_API_KEY env var)
Returns:
  • GraphGenJob instance for the existing job

from_graph_evolve_job_id

from_graph_evolve_job_id(cls, graph_evolve_job_id: str, backend_url: Optional[str] = None, api_key: Optional[str] = None) -> GraphGenJob
Alias for resuming an GraphGen job from a GEPA job ID.

job_id

job_id(self) -> Optional[str]
Get the GraphGen job ID (None if not yet submitted).

graph_evolve_job_id

graph_evolve_job_id(self) -> Optional[str]
Get the underlying GEPA job ID if known.

submit

submit(self) -> GraphGenSubmitResult
Submit the job to the backend. Returns:
  • GraphGenSubmitResult with job IDs and initial status
Raises:
  • RuntimeError: If job submission fails

get_status

get_status(self) -> Dict[str, Any]
Get current job status. Returns:
  • Job status dictionary containing ‘status’, ‘best_score’, etc.
Raises:
  • RuntimeError: If job hasn’t been submitted yet or API call fails.

start

start(self) -> Dict[str, Any]
Start a queued GraphGen job. This is only needed if the job was created with auto_start=False or ended up queued. Returns:
  • Updated job status dictionary.

get_events

get_events(self) -> Dict[str, Any]
Fetch events for this GraphGen job. Args:
  • since_seq: Return events with sequence number greater than this.
  • limit: Maximum number of events to return.
Returns:
  • Backend envelope: {“events”: […], “has_more”: bool, “next_seq”: int}.

get_metrics

get_metrics(self) -> Dict[str, Any]
Fetch metrics for this GraphGen job. Args:
  • name: Optional metric name filter.
  • after_step: Optional step filter.
  • limit: Maximum number of metrics to return.
  • run_id: Optional run identifier filter.
Returns:
  • Dictionary containing ‘metrics’ list.

stream_until_complete

stream_until_complete(self) -> Dict[str, Any]
Stream job events until completion using Server-Sent Events (SSE). This method connects to the backend SSE stream and processes events in real-time until the job reaches a terminal state (completed, failed, or cancelled). Events include:
  • job_started: Job execution began
  • generation_started: New generation of candidates started
  • candidate_evaluated: A candidate graph was evaluated
  • generation_completed: Generation finished
  • optimization_completed: Job finished successfully
  • job_failed: Job encountered an error
Args:
  • timeout: Maximum seconds to wait for completion
  • interval: Seconds between status checks (for SSE reconnects)
  • handlers: Optional StreamHandler instances for custom event handling. Defaults to GraphGenHandler which provides formatted CLI output.
  • on_event: Optional callback function called on each event. Receives the event dict as argument.
Returns:
  • Final job status dictionary containing ‘status’, ‘best_score’, etc.
Raises:
  • RuntimeError: If job hasn’t been submitted yet
  • TimeoutError: If timeout exceeded before job completion

download_prompt

download_prompt(self) -> str
Download the optimized prompt from a completed job. For graph-first jobs, prefer download_graph_txt(); this method is mainly useful for legacy single-node prompt workflows. Returns:
  • Optimized prompt text
Raises:
  • RuntimeError: If job hasn’t been submitted or isn’t complete

download_graph_txt

download_graph_txt(self) -> str
Download a PUBLIC (redacted) graph export for a completed job. Graph-first GraphGen jobs produce multi-node graphs. The internal graph YAML/spec is proprietary and never exposed. This helper downloads the .txt export from: GET /api/graphgen/jobs/{job_id}/graph.txt

run_inference

run_inference(self, input_data: Dict[str, Any]) -> Dict[str, Any]
Run inference with the optimized graph/workflow. Args:
  • input_data: Input data matching the task format
  • model: Override model (default: use job’s policy model)
  • prompt_snapshot_id: Legacy alias for selecting a specific snapshot.
  • graph_snapshot_id: Specific GraphSnapshot to use (default: best). Preferred for graph-first jobs. If provided, it is sent as prompt_snapshot_id for backward-compatible backend routing.
Returns:
  • Output dictionary containing ‘output’, ‘usage’, etc.
Raises:
  • RuntimeError: If job hasn’t been submitted or inference fails.
  • ValueError: If both prompt_snapshot_id and graph_snapshot_id are provided.

run_inference_output

run_inference_output(self, input_data: Dict[str, Any]) -> Any
Convenience wrapper returning only the model output.

run_verifier

run_verifier(self, session_trace: Dict[str, Any] | SessionTraceInput) -> GraphGenGraphJudgeResponse
Run a verifier graph on an execution trace. This method is specifically for graphs trained with graph_type=“verifier”. It accepts a V3 trace and returns structured rewards (score, reasoning, per-event rewards). Args:
  • session_trace: V3 session trace to evaluate. Can be a dict or SessionTraceInput.
  • context: Additional context for evaluation (e.g., rubric overrides, task description).
  • prompt_snapshot_id: Specific snapshot to use (default: best).
  • graph_snapshot_id: Specific GraphSnapshot to use (default: best). Preferred for graph-first jobs.
Returns:
  • GraphGenGraphJudgeResponse containing structured rewards and reasoning.
Raises:
  • RuntimeError: If job hasn’t been submitted or inference fails.

run_judge

run_judge(self, session_trace: Dict[str, Any] | SessionTraceInput) -> GraphGenGraphJudgeResponse
Deprecated: use run_verifier instead.

get_graph_record

get_graph_record(self) -> Dict[str, Any]
Get the optimized graph record (snapshot) for a completed job. Note: for graph-first jobs, this record is redacted and never includes proprietary YAML/spec. Use download_graph_txt() for the public export. Args:
  • prompt_snapshot_id: Legacy alias for selecting a specific snapshot.
  • graph_snapshot_id: Specific GraphSnapshot to use (default: best).
Returns:
  • Graph record dictionary containing:
    • job_id: The job ID
    • snapshot_id: The snapshot ID used
    • prompt: Extracted prompt text (legacy single-node only; may be empty)
    • graph: Public graph record payload (e.g., export metadata)
    • model: Model used for this graph (optional)
Raises:
  • RuntimeError: If job hasn’t been submitted or API call fails.
  • ValueError: If both prompt_snapshot_id and graph_snapshot_id are provided.

Configuration Reference

OutputConfig

Configuration for graph output extraction + validation. This model defines how graph outputs should be extracted and validated. It supports JSON Schema validation, multiple output formats, and configurable extraction paths. Attributes:
  • schema_: JSON Schema (draft-07) for output validation. Use alias “schema” in JSON.
  • format: Expected output format - “json”, “text”, “tool_calls”, or “image”.
  • strict: If True, validation failures fail the run; if False, log warnings and continue.
  • extract_from: Ordered list of dot-paths/keys to try when extracting output from final_state.

GraphGenTaskSetMetadata

Metadata about the dataset. Methods:

validate_select_output

validate_select_output(cls, v: Any) -> Optional[Union[str, List[str]]]
Validate select_output is a string or list of strings.

validate_output_config

validate_output_config(cls, v: Any) -> Optional[OutputConfig]
Convert dict to OutputConfig for backward compatibility.

GraphGenRubricCriterion

A single rubric criterion for evaluation.

GraphGenRubricOutcome

Outcome-level rubric (evaluates final output).

GraphGenRubricEvents

Event-level rubric (evaluates intermediate steps).

GraphGenRubric

Rubric for evaluating task outputs.

GraphGenTask

A single task in the dataset. Tasks have arbitrary JSON inputs and optional task-specific rubrics. Gold outputs are stored separately and linked via task_id.

GraphGenGoldOutput

A gold/reference output. Can be linked to a specific task via task_id, or standalone (for reference examples). Standalone gold outputs (no task_id) are used as reference pool for contrastive judging.

GraphGenJudgeConfig

Configuration for the judge used during optimization.

GraphGenTaskSet

The complete GraphGen dataset format. Contains tasks with arbitrary JSON inputs, gold outputs (optionally linked to tasks), rubrics (task-specific and/or default), and judge configuration. Methods:

validate_unique_task_ids

validate_unique_task_ids(cls, v: List[GraphGenTask]) -> List[GraphGenTask]
Ensure all task IDs are unique.

validate_gold_output_task_ids

validate_gold_output_task_ids(cls, v: List[GraphGenGoldOutput], info: ValidationInfo) -> List[GraphGenGoldOutput]
Ensure gold output task_ids reference valid tasks. Args:
  • v: The list of gold outputs being validated.
  • info: Pydantic ValidationInfo providing access to other fields via info.data.
Returns:
  • The validated list of gold outputs.
Raises:
  • ValueError: If a gold output references a non-existent task ID.

validate_select_output

validate_select_output(cls, v: Any) -> Optional[Union[str, List[str]]]
Validate select_output is a string or list of strings.

validate_output_config

validate_output_config(cls, v: Any) -> Optional[OutputConfig]
Convert dict to OutputConfig for backward compatibility.

get_task_by_id

get_task_by_id(self, task_id: str) -> Optional[GraphGenTask]
Get a task by its ID.

get_task_by_index

get_task_by_index(self, index: int) -> Optional[GraphGenTask]
Get a task by zero-based index. Args:
  • index: Zero-based index into tasks list (0 to len(tasks)-1).
Returns:
  • Task at the specified index, or None if index is out of range.

get_gold_output_for_task

get_gold_output_for_task(self, task_id: str) -> Optional[GraphGenGoldOutput]
Get the gold output linked to a specific task.

get_standalone_gold_outputs

get_standalone_gold_outputs(self) -> List[GraphGenGoldOutput]
Get gold outputs not linked to any task (reference pool for contrastive judge).

EventInput

V3-compatible event input for verifier evaluation.

SessionTimeStepInput

V3-compatible session time step input.

SessionTraceInput

V3-compatible session trace input for judge evaluation.

GraphGenGraphJudgeRequest

Request for verifier graph inference.

GraphGenGraphCompletionsModelUsage

Token usage and cost for a single model in a graph completion.

EventRewardResponse

Event-level reward from verifier evaluation.

OutcomeRewardResponse

Outcome-level reward from verifier evaluation.

GraphGenGraphJudgeResponse

Response from verifier graph inference.

GraphGenGraphVerifierRequest

Alias for GraphGenGraphJudgeRequest with verifier terminology.

GraphGenGraphVerifierResponse

Alias for GraphGenGraphJudgeResponse with verifier terminology.

GraphGenJobConfig

Configuration for an GraphGen optimization job. Methods:

get_policy_provider

get_policy_provider(self) -> str
Get the policy provider (auto-detect if not specified).