Optimize Your Model's Prompts using GEPA

GEPA (Generalized Evolutionary Prompt Adaptation) is a reflective evolutionary optimizer that automatically improves your prompts through LLM-guided mutations and multi-objective selection. This walkthrough covers everything from setup to retrieving your optimized prompts. Reference: Agrawal et al. (2025). “GEPA: Reflective Prompt Evolution.” arXiv:2507.19457

Why GEPA?

GEPA outperforms GRPO by 10% on average (up to 20%) while using up to 35x fewer rollouts. Best for:

Classification tasks (Banking77, intent classification)
Multi-hop QA (HotpotQA)
Instruction-following tasks
When you want diverse prompt variants (Pareto front)

Typical results: 60-75% baseline accuracy → 85-90%+ after 15 generations

Prerequisites

Before starting, ensure you have:

# Required environment variables in .env
GROQ_API_KEY=gsk_...          # For policy model inference
SYNTH_API_KEY=sk_...          # For backend authentication
ENVIRONMENT_API_KEY=sk_env_... # Optional - auto-generated if not set

Install the Synth AI SDK:

pip install synth-ai

How GEPA Works

GEPA uses evolutionary principles to explore the prompt space. Understanding the algorithm helps you configure it effectively.

The Optimization Flow

Initialize
- Split seeds into pareto_seeds and feedback_seeds
- Evaluate baseline transformation
- Generate initial population via proposer
- Evaluate & add to Pareto archive
Evolve (for each generation)
- For each child:
  - Select parent (instance-wise Pareto sampling)
  - Generate feedback from parent trace
  - Mutate via proposer (LLM-guided)
  - Minibatch gating (quick eval)
  - Full Pareto evaluation (if gating passed)
  - Update archive if non-dominated
Terminate
- Budget exhausted OR
- Generation limit OR
- No improvement for N generations
Return best transformation, by accuracy

Key Components

1. Pattern-Based Transformations

GEPA represents prompt changes as transformations that can be applied to your baseline:

# A transformation replaces text in your prompt
TextTransformation(
    old_text="You are a helpful assistant.",      # Original text
    new_text="You are a banking classification expert...",  # Optimized text
    apply_to_role="system"  # Only apply to system messages
)

2. Pareto Archive

GEPA maintains a Pareto front of non-dominated solutions, balancing multiple objectives:

Accuracy (primary) – Task performance
Tool call rate – Function calling frequency (for agentic tasks)

Solutions are kept if they’re not dominated by any other solution across all objectives.

3. Instance-Wise Parent Selection

Unlike traditional selection that uses aggregate scores, GEPA counts how many individual seeds each prompt “wins” on:

# Parent selection weights prompts by per-seed wins
wins = count_seeds_where_prompt_is_best(prompt)
selection_weight = (wins + ε) ** selection_pressure

This favors prompts that excel on specific example types, encouraging specialization.

4. LLM-Guided Mutations

The proposer (meta-model) generates new prompts by analyzing:

Current instruction (baseline)
Rollout examples (input/output/feedback for each seed)
Trace feedback (e.g., “model under-utilizes tools”)
Dataset and program context

The proposer uses instruction typology to structure outputs with: input descriptions, core task, premises, heuristics, constraints, rules, and output descriptions.

Step 1: Create a LocalAPI

Your LocalAPI evaluates prompts by running rollouts and returning scores. See LocalAPI Guide for details. Example Banking77 LocalAPI structure:

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class RolloutRequest(BaseModel):
    seed: int
    run_id: str
    # ... other fields

@app.post("/rollout")
async def rollout(request: RolloutRequest):
    # 1. Load example for this seed
    example = load_example(request.seed)

    # 2. Call your LLM with the prompt (interceptor handles substitution)
    prediction = await call_llm(example.query)

    # 3. Score the prediction
    correct = prediction == example.expected_label

    return {
        "metrics": {"correct": correct},
        "outcome": 1.0 if correct else 0.0
    }

Step 2: Deploy Your LocalAPI

The Synth AI backend needs to reach your LocalAPI over the internet to send rollout requests. Use the Python SDK’s InProcessTaskApp for seamless deployment with automatic tunneling:

import os
from synth_ai.sdk import InProcessTaskApp

# Import your task app
from my_task_app import app

async with InProcessTaskApp(app=app) as task_app:
    print(f"Task app running at: {task_app.url}")
    # task_app.url is a SynthTunnel URL like https://st.usesynth.ai/s/rt_...

This:

Starts your LocalAPI locally
Creates a SynthTunnel URL (or another tunnel backend if configured)
Returns the tunnel URL via task_app.url

Verify the Deployment

Check that your LocalAPI is accessible:

curl https://st.usesynth.ai/s/rt_.../health

Alternative Tunnel Backends

SynthTunnel is the default and recommended backend. If you need a Cloudflare tunnel instead:

# Cloudflare quick tunnel (ephemeral, requires cloudflared binary)
async with InProcessTaskApp(
    app=app,
    tunnel_backend="cloudflare_quick",
) as task_app:
    print(f"URL: {task_app.url}")

# Cloudflare managed lease (stable hostname)
async with InProcessTaskApp(
    app=app,
    tunnel_backend="cloudflare_managed_lease",
) as task_app:
    print(f"Stable URL: {task_app.url}")

Step 3: Create the Configuration

Create a TOML file defining your optimization parameters. The task_app_url should match the URL from Step 2 (stored in your .env as TASK_APP_URL):

[prompt_learning]
algorithm = "gepa"
task_app_url = "https://my-company.usesynth.ai"  # From TASK_APP_URL in .env
task_app_id = "banking77"

# Training seeds (used during optimization)
evaluation_seeds = [50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79]

# Validation seeds (held-out for final evaluation)
validation_seeds = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]

# Initial prompt template
[prompt_learning.initial_prompt]
messages = [
  { role = "system", content = "You are a banking intent classification assistant." },
  { role = "user", pattern = "Customer Query: {query}\n\nClassify this query into one of 77 banking intents." }
]

# GEPA-specific configuration
[prompt_learning.gepa]
num_generations = 15              # Evolutionary cycles to run
children_per_generation = 5       # Mutations per generation
pareto_set_size = 20              # Seeds for Pareto evaluation
minibatch_size = 3                # Seeds for quick gating
rollout_budget = 1000             # Total rollouts allowed
archive_size = 64                 # Max Pareto archive size

Configuration Parameters

Parameter	Description	Default	Recommended Range
`num_generations`	Evolutionary cycles	10	5-20
`children_per_generation`	Mutations per generation	5	3-10
`pareto_set_size`	Seeds for Pareto evaluation	20	15-30
`minibatch_size`	Seeds for gating evaluation	3	2-5
`rollout_budget`	Total rollouts allowed	1000	200-2000
`archive_size`	Max Pareto archive size	64	32-128
`feedback_fraction`	Fraction of seeds for feedback	0.3	0.2-0.5
`proposer_mode`	Proposer type (`synth`, `gepa-ai`, `dspy`)	`synth`	-

Step 4: Launch the Optimization Job

import os
from synth_ai.sdk import PolicyOptimizationJob

async def run_optimization():
    client = PromptLearningClient(api_key=os.environ["SYNTH_API_KEY"])

    # Create job from TOML config
    job = await client.create_job_from_toml("configs/prompt_learning/banking77_gepa.toml")
    print(f"Created job: {job['id']}")

    # Start the job
    await client.start_job(job["id"])

    # Poll until completion
    result = await client.poll_until_terminal(job["id"])
    print(f"Best score: {result['best_score']}")
    return result

The SDK will:

Validate your TOML configuration
Verify the task app is reachable
Submit the job to Synth AI
Poll for completion

Understanding the Output

During optimization, you’ll see progress updates:

[18:35:37]    0.0s  Status: running
[18:35:42]    5.2s  Status: running | Best: 0.500
[18:35:48]   11.4s  Status: running | Best: 0.625
[18:35:54]   17.6s  Status: running | Best: 0.750
[18:36:00]   23.8s  Status: running | Best: 0.875
...
[18:38:50]  175.9s  Status: succeeded | Best: 0.875

Your LocalAPI logs will show rollout requests:

[TASK_APP] INBOUND_ROLLOUT: run_id=prompt-learning-74-5bec8a6f seed=74
[TASK_APP] PREDICTION: expected=card_arrival predicted=card_delivery_estimate correct=False
[BANKING77_ROLLOUT] run_id=prompt-learning-74-5bec8a6f reward=0.0

Step 5: Understanding the Optimization Process

Generation-by-Generation Progress

Generation	What Happens	Expected Accuracy
0 (baseline)	Evaluate initial prompt	60-75%
1-3	Explore diverse mutations	70-80%
5-10	Convergence begins	80-85%
10-15	Fine-tuning best solutions	85-90%+

How Mutations Are Generated

The proposer receives:

Baseline instruction: Your current system prompt
Rollout examples: Input/output pairs with feedback (correct/incorrect, error messages)
Trace statistics: Tool call rate, trajectory length, etc.
Feedback hints: Rule-based suggestions like “model under-utilizes tools”

It generates a new instruction following instruction typology:

[Input Description]
You will be given a customer banking query.

[Core Task Description]
Your task is to classify the query into one of 77 banking intents.

[Premises]
Banking queries often contain domain-specific terminology.
Multiple intents may seem applicable; choose the most specific.

[Heuristics]
Look for keywords indicating the customer's primary need.
Consider the emotional tone to distinguish complaints from inquiries.

[Constraints]
Avoid defaulting to generic intents when specific ones apply.

[Rules]
Output only the intent name, nothing else.

[Output Description]
Return exactly one intent from the predefined list.

Minibatch Gating

Before full evaluation, GEPA performs a quick check:

Evaluate child on a small minibatch (3 seeds)
Compare to parent’s score on the same seeds
If child is worse → skip full evaluation (saves budget)
If child is promising → proceed to full Pareto evaluation

This saves significant compute by filtering out poor mutations early.

Step 6: Retrieve Optimized Prompts

After completion, fetch your results using the Python SDK:

import os
from synth_ai.sdk import PolicyOptimizationJob

API_KEY = os.environ["SYNTH_API_KEY"]
JOB_ID = "pl_abc123"  # From the job submission output

# Get all results
results = get_prompts(job_id=JOB_ID, api_key=API_KEY)
print(f"Best Score: {results['best_score']:.3f}")

# Get top 5 prompts from Pareto front
for rank in range(1, 6):
    prompt = get_prompt_text(job_id=JOB_ID, api_key=API_KEY, rank=rank)
    print(f"Rank {rank}: {len(prompt)} chars")
    print(prompt[:200] + "...")

# Get scoring summary
summary = get_scoring_summary(job_id=JOB_ID, api_key=API_KEY)
print(f"Train={summary['best_train_accuracy']:.3f}")
print(f"Validation={summary.get('best_validation_accuracy', 0.0):.3f}")
print(f"Candidates Tried={summary['num_candidates_tried']}")

Understanding the Pareto Front

GEPA returns multiple prompts representing different trade-offs:

Rank	Accuracy	Token Count	Trade-off
1	92%	450	Highest accuracy
2	90%	280	Good accuracy, shorter
3	88%	150	Efficient, still performant

Choose based on your latency/cost requirements.

Step 7: Use the Optimized Prompt

Replace your baseline prompt with the optimized version:

# Before: baseline prompt
system_prompt = "You are a banking intent classification assistant."

# After: optimized prompt (rank 1 from GEPA)
system_prompt = get_prompt_text(job_id=JOB_ID, rank=1)

# Use in your application
response = await openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": f"Customer Query: {query}"}
    ]
)

In-Process Optimization

For development and testing, run everything from a single Python script:

import os
from synth_ai.sdk import InProcessTaskApp
from synth_ai.sdk import PolicyOptimizationJob

# Start LocalAPI in-process (handles tunneling automatically)
async with InProcessTaskApp(app=my_task_app) as task_app:
    # LocalAPI is now accessible via tunnel
    task_app_url = task_app.url

    # Submit optimization job
    client = PromptLearningClient(api_key=os.environ["SYNTH_API_KEY"])
    job = await client.create_job(config=my_config)
    await client.start_job(job["id"])

    # Poll until complete
    result = await client.poll_until_terminal(job["id"])
    print(f"Best score: {result['best_score']}")

See In-Process Task App Walkthrough for a complete example.

Termination Conditions

GEPA stops when any condition is met:

Condition	Description	Configuration
`rollout_budget`	Total rollouts exhausted	`rollout_budget = 1000`
`max_spend_usd`	USD budget limit	`max_spend_usd = 5.0`
`num_generations`	Generation limit reached	`num_generations = 15`
`patience_generations`	No improvement for N generations	`patience_generations = 5`

Supported Models

See Supported Models for Prompt Optimization for the full list of policy models.

Getting started

Algorithms

LocalAPI

Tunnel/Deploy

Datasets & Verifiers

Optimize Your Model's Prompts using GEPA

Why GEPA?

Prerequisites

How GEPA Works

The Optimization Flow

Key Components

1. Pattern-Based Transformations

2. Pareto Archive

3. Instance-Wise Parent Selection

4. LLM-Guided Mutations

Step 1: Create a LocalAPI

Step 2: Deploy Your LocalAPI

Verify the Deployment

Alternative Tunnel Backends

Step 3: Create the Configuration

Configuration Parameters

Step 4: Launch the Optimization Job

Understanding the Output

Step 5: Understanding the Optimization Process

Generation-by-Generation Progress

How Mutations Are Generated

Minibatch Gating

Step 6: Retrieve Optimized Prompts

Understanding the Pareto Front

Step 7: Use the Optimized Prompt

In-Process Optimization

Termination Conditions

Supported Models

Getting started

Algorithms

LocalAPI

Tunnel/Deploy

Datasets & Verifiers

​Why GEPA?

​Prerequisites

​How GEPA Works

​The Optimization Flow

​Key Components

​1. Pattern-Based Transformations

​2. Pareto Archive

​3. Instance-Wise Parent Selection

​4. LLM-Guided Mutations

​Step 1: Create a LocalAPI

​Step 2: Deploy Your LocalAPI

​Verify the Deployment

​Alternative Tunnel Backends

​Step 3: Create the Configuration

​Configuration Parameters

​Step 4: Launch the Optimization Job

​Understanding the Output

​Step 5: Understanding the Optimization Process

​Generation-by-Generation Progress

​How Mutations Are Generated

​Minibatch Gating

​Step 6: Retrieve Optimized Prompts

​Understanding the Pareto Front

​Step 7: Use the Optimized Prompt

​In-Process Optimization

​Termination Conditions

​Supported Models

Why GEPA?

Prerequisites

How GEPA Works

The Optimization Flow

Key Components

1. Pattern-Based Transformations

2. Pareto Archive

3. Instance-Wise Parent Selection

4. LLM-Guided Mutations

Step 1: Create a LocalAPI

Step 2: Deploy Your LocalAPI

Verify the Deployment

Alternative Tunnel Backends

Step 3: Create the Configuration

Configuration Parameters

Step 4: Launch the Optimization Job

Understanding the Output

Step 5: Understanding the Optimization Process

Generation-by-Generation Progress

How Mutations Are Generated

Minibatch Gating

Step 6: Retrieve Optimized Prompts

Understanding the Pareto Front

Step 7: Use the Optimized Prompt

In-Process Optimization

Termination Conditions

Supported Models