Quickstart: Graph Evolve

This guide walks you through training a multi-node LLM graph using Graph Evolve. By the end, you’ll have an optimized graph that outperforms a single prompt.

For most use cases, we recommend using the Graphs quickstart which provides a simpler interface. Use Graph Evolve directly when you need fine-grained control over evolution parameters.

Prerequisites

Synth AI API key (get one here)
Python 3.11+
synth-ai package installed

pip install synth-ai

Step 1: Prepare Your Dataset

Create a JSON file with your tasks and expected outputs:

{
  "tasks": [
    {
      "task_id": "q1",
      "input": {
        "question": "What is the capital of France?",
        "context": "France is a country in Western Europe."
      }
    },
    {
      "task_id": "q2",
      "input": {
        "question": "Who wrote Romeo and Juliet?",
        "context": "Romeo and Juliet is a famous tragedy."
      }
    }
  ],
  "gold_outputs": [
    {
      "task_id": "q1",
      "output": { "answer": "Paris" },
      "score": 1.0
    },
    {
      "task_id": "q2",
      "output": { "answer": "William Shakespeare" },
      "score": 1.0
    }
  ],
  "metadata": {
    "name": "simple_qa",
    "task_description": "Answer questions using the provided context"
  }
}

Save this as dataset.json.

Step 2: Create Configuration

Create a TOML configuration file:

# config.toml
[graph_optimization]
algorithm = "graph_evolve"
dataset_name = "simple_qa"

# Graph settings
graph_type = "policy"
graph_structure = "dag"
topology_guidance = "First extract relevant information from context, then formulate the answer"

# Models
allowed_policy_models = ["gpt-4o-mini"]
verifier_model = "gpt-4o-mini"
scoring_strategy = "rubric"

# Evolution
[graph_optimization.evolution]
num_generations = 3
children_per_generation = 2

[graph_optimization.proposer]
model = "gpt-4.1"

# Data splits
[graph_optimization.seeds]
train = [0, 1, 2, 3, 4]
validation = [5, 6, 7]

# Budget
[graph_optimization.limits]
max_spend_usd = 5.0
timeout_seconds = 1800

Step 3: Run Training

from synth_ai.sdk import GraphOptimizationJob

job = GraphOptimizationJob.from_dataset(
    "dataset.json",
    policy_model="gpt-4o-mini",
    rollout_budget=100,
    proposer_effort="medium",
)
job.submit()
result = job.stream_until_complete()
print(f"Best score: {result.best_score}")

Step 4: Use Your Graph

Production Inference

# Using Graph Evolve job
output = job.run_inference({
    "question": "What is the largest planet?",
    "context": "Jupiter is the largest planet in our solar system."
})
print(output)  # {"answer": "Jupiter"}

Download for Local Use

# Get the optimized graph
graph_export = job.download_prompt()
print(graph_export)

What Happens During Training

Initialization: Graph Evolve creates an initial population of graph candidates
Evaluation: Each candidate is run on training seeds and scored
Selection: Best candidates are selected for the next generation
Mutation: LLM proposes modifications to prompts and structure
Repeat: Process continues for num_generations
Validation: Top candidates are evaluated on held-out validation seeds

Generation 1: best_score=0.65, candidates=5
Generation 2: best_score=0.72, candidates=5
Generation 3: best_score=0.81, candidates=5
Validation: final_score=0.79

Tips for Better Results

1. More Training Data

More examples = better optimization:

[graph_optimization.seeds]
train = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
validation = [15, 16, 17, 18, 19]

2. Topology Guidance

Help the proposer understand your task:

topology_guidance = """
For multi-hop reasoning questions:
1. First identify what information is needed
2. Extract relevant facts from context
3. Combine facts to form the answer
"""

3. Appropriate Structure

Match structure to task complexity:

Task	Recommended Structure
Simple classification	`single_prompt`
Multi-step reasoning	`dag`
Routing/branching logic	`conditional`

4. Budget Allocation

More generations with fewer children often beats few generations with many children:

[graph_optimization.evolution]
num_generations = 5        # More iterations
children_per_generation = 2  # Fewer variants per iteration

Getting started

Products

Container

Tunnel/Deploy

Quickstart: Graph Evolve

Quickstart: Graph Evolve

Prerequisites

Step 1: Prepare Your Dataset

Step 2: Create Configuration

Step 3: Run Training

Step 4: Use Your Graph

Production Inference

Download for Local Use

What Happens During Training

Tips for Better Results

1. More Training Data

2. Topology Guidance

3. Appropriate Structure

4. Budget Allocation

Troubleshooting

Low Scores

Slow Training

High Costs

Next Steps

Ready to get started?

Get Started

Schedule Demo

Getting started

Products

Container

Tunnel/Deploy

​Quickstart: Graph Evolve

​Prerequisites

​Step 1: Prepare Your Dataset

​Step 2: Create Configuration

​Step 3: Run Training

​Step 4: Use Your Graph

​Production Inference

​Download for Local Use

​What Happens During Training

​Tips for Better Results

​1. More Training Data

​2. Topology Guidance

​3. Appropriate Structure

​4. Budget Allocation

​Troubleshooting

​Low Scores

​Slow Training

​High Costs

​Next Steps

​Ready to get started?

Get Started

Schedule Demo

Quickstart: Graph Evolve

Prerequisites

Step 1: Prepare Your Dataset

Step 2: Create Configuration

Step 3: Run Training

Step 4: Use Your Graph

Production Inference

Download for Local Use

What Happens During Training

Tips for Better Results

1. More Training Data

2. Topology Guidance

3. Appropriate Structure

4. Budget Allocation

Troubleshooting

Low Scores

Slow Training

High Costs

Next Steps

Ready to get started?