Overview

Prompt optimization uses evolutionary algorithms to automatically optimize prompts for classification, reasoning, and instruction-following tasks. Synth supports two state-of-the-art algorithms: GEPA (Genetic Evolution of Prompt Architectures) and MIPRO (Meta-Instruction PROposer). References:

GEPA: Agrawal et al. (2025). “GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning.” arXiv:2507.19457
MIPRO: Opsahl-Ong et al. (2024). “Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs.” arXiv:2406.11695

Quick Start Checklist

1. Build a prompt evaluation task app

Define a task app that evaluates prompt performance on your task (classification accuracy, QA correctness, etc.). → Read: Task App requirements

2. Deploy and verify the service

Smoke-test locally, then deploy to Modal or your host of choice once health checks pass.
→ Read: Deploying task apps

3. Author the prompt optimization config

Capture algorithm choice (GEPA or MIPRO), initial prompt template, training/validation seeds, and optimization parameters in TOML.
→ Read: Prompt optimization configs

4. Launch the optimization job

Run uvx synth-ai train --config config.toml to create the job and stream status/metrics.
→ Read: Launch training jobs

5. Query and evaluate results

Use the Python API or REST endpoints to retrieve optimized prompts and evaluate them on held-out validation sets.
→ Read: Querying results

Algorithm Overview

GEPA (Genetic Evolution of Prompt Architectures)

Best for: Broad exploration, diverse prompt variants, classification tasks
Reference: Agrawal et al. (2025) GEPA uses evolutionary principles to explore the prompt space:

Population-based search with multiple prompt variants
LLM-guided mutations for intelligent prompt modifications
Pareto optimization balancing performance and prompt length
Multi-stage support for pipeline optimization

Typical results: Improves accuracy from 60-75% (baseline) to 85-90%+ over 15 generations Key features:

Maintains a Pareto front of non-dominated solutions
Supports both template mode and pattern-based transformations
Module-aware evolution for multi-stage pipelines
Reflective feedback from execution traces

MIPRO (Meta-Instruction PROposer)

Best for: Efficient optimization, task-specific improvements, faster convergence
Reference: Opsahl-Ong et al. (2024) MIPRO uses meta-learning to propose better instructions:

Meta-LLM (e.g., GPT-4o-mini) generates instruction variants
TPE (Tree-structured Parzen Estimator) guides Bayesian search
Bootstrap phase collects few-shot examples from high-scoring seeds
Reference corpus (up to 50k tokens) enriches meta-prompts
System spec integration for constraint-aware optimization

Typical results: Achieves similar accuracy gains with fewer evaluations (~96 rollouts vs ~1000 for GEPA) Key features:

Bootstrap phase initializes with task-specific examples
Program-aware instruction proposals
Multi-stage pipeline support with LCS-based stage detection
Token budget tracking and cost optimization
System spec integration for constraint-aware optimization

Architecture: Inference Interception

🚨 Critical: Both algorithms use an interceptor pattern that ensures optimized prompts never reach task apps. All prompt modifications happen in the backend via an inference interceptor that substitutes prompts before they reach the LLM.

✅ CORRECT FLOW:
Backend → register_prompt → Interceptor → substitutes → LLM

❌ WRONG FLOW:
Backend → prompt_template in payload → Task App (NEVER DO THIS)

This separation ensures:

Task apps remain unchanged during optimization
Prompt optimization logic stays in the backend
Secure, correct prompt substitution

Supported Models

Policy Models (Task Execution)

Both GEPA and MIPRO support policy models from:

OpenAI: gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-5, gpt-5-mini, gpt-5-nano
Groq: gpt-oss-20b, gpt-oss-120b, llama-3.3-70b-versatile, qwen-32b, qwen3-32b
Google: gemini-2.5-pro, gemini-2.5-pro-gt200k, gemini-2.5-flash, gemini-2.5-flash-lite

Mutation Models (GEPA Only)

Used to generate prompt mutations:

Common choices: openai/gpt-oss-120b, llama-3.3-70b-versatile
Nano models are rejected (too small for generation tasks)

Meta Models (MIPRO Only)

Used to generate instruction proposals:

Common choices: gpt-4o-mini, gpt-4.1-mini (most common default)
Nano models are rejected (too small for generation tasks)

Note: gpt-5-pro is explicitly rejected for all model types (too expensive:

15/

120 per 1M tokens) See Supported Models for complete details.

When to Use Each Algorithm

Aspect	GEPA	MIPRO
Search Method	Genetic evolution	Meta-LLM + TPE
Exploration	Broad, diverse variants	Focused, efficient
Computational Cost	Lower (fewer LLM calls)	Higher (meta-model calls)
Convergence	10-15 generations	10-20 iterations
Best For	Classification, multi-hop QA	Task-specific optimization
Evaluation Budget	~1000 rollouts	~96 rollouts

Choose GEPA if:

You want diverse prompt variants (Pareto front)
You have a large evaluation budget (1000+ rollouts)
You need broad exploration of the prompt space

Choose MIPRO if:

You want faster convergence with fewer evaluations
You have clear task structure (can bootstrap with examples)
You need efficient optimization (mini-batch evaluation)

Multi-Stage Pipeline Support

Both algorithms support optimizing prompts for multi-stage pipelines (e.g., Banking77 classifier → calibrator):

LCS-based stage detection automatically identifies which stage is being called
Per-stage optimization evolves separate instructions for each pipeline module
Unified evaluation tracks end-to-end performance across all stages

Next Steps

Algorithm Comparison – Detailed comparison of GEPA vs MIPRO
System Specifications – How specs guide optimization
Configuration Reference – Complete parameter documentation
Training Guide – Step-by-step training instructions
Banking77 Example – Complete walkthrough

Get Started

Train Your Model

Training Configs

Prompt Optimization

Supervised Fine Tuning

Reinforcement Learning

SDK Reference

Quick Start Checklist

1. Build a prompt evaluation task app

2. Deploy and verify the service

3. Author the prompt optimization config

4. Launch the optimization job

5. Query and evaluate results

Algorithm Overview

GEPA (Genetic Evolution of Prompt Architectures)

MIPRO (Meta-Instruction PROposer)

Architecture: Inference Interception

Supported Models

Policy Models (Task Execution)

Mutation Models (GEPA Only)

Meta Models (MIPRO Only)

When to Use Each Algorithm

Multi-Stage Pipeline Support

Next Steps

Get Started

Train Your Model

Training Configs

Prompt Optimization

Supervised Fine Tuning

Reinforcement Learning

SDK Reference

​Quick Start Checklist

​1. Build a prompt evaluation task app

​2. Deploy and verify the service

​3. Author the prompt optimization config

​4. Launch the optimization job

​5. Query and evaluate results

​Algorithm Overview

​GEPA (Genetic Evolution of Prompt Architectures)

​MIPRO (Meta-Instruction PROposer)

​Architecture: Inference Interception

​Supported Models

​Policy Models (Task Execution)

​Mutation Models (GEPA Only)

​Meta Models (MIPRO Only)

​When to Use Each Algorithm

​Multi-Stage Pipeline Support

​Next Steps

Quick Start Checklist

1. Build a prompt evaluation task app

2. Deploy and verify the service

3. Author the prompt optimization config

4. Launch the optimization job

5. Query and evaluate results

Algorithm Overview

GEPA (Genetic Evolution of Prompt Architectures)

MIPRO (Meta-Instruction PROposer)

Architecture: Inference Interception

Supported Models

Policy Models (Task Execution)

Mutation Models (GEPA Only)

Meta Models (MIPRO Only)

When to Use Each Algorithm

Multi-Stage Pipeline Support

Next Steps