Skip to main content

Overview

GEPA (Genetic Evolution of Prompt Architectures) uses evolutionary algorithms to optimize prompts through population-based search, mutation, and Pareto optimization.

High-Level Architecture

GEPA High-Level Architecture

Key Components:
  1. GEPA Optimizer - Coordinates the evolutionary loop
  2. Population Manager - Maintains prompt variants
  3. Pareto Archive - Tracks non-dominated solutions
  4. Mutation Engine - Generates prompt variations
  5. Evaluation Engine - Scores prompts via Task App
  6. Interceptor - Transforms prompts in-flight (no Task App changes)

Detailed Flow Diagram

GEPA Detailed Flow

Phase Breakdown

Phase 0: Pattern Validation

Backend → Start Interceptor → Fetch Baseline Messages → Validate Pattern Match

Phase 1: Population Initialization

Baseline Prompt → Generate Mutations → Create Population (20-30 variants)

Phase 2: Evaluation Loop

For each generation:
  1. Mutation Phase (parallel)
     - Select parents from Pareto archive
     - Generate mutations (LLM-guided or regex)
     - Create crossover offspring

  2. Evaluation Phase (parallel with minibatch gating)
     - Evaluate candidates on pareto_seeds
     - Score: accuracy, token_count, tool_call_rate
     - Extract execution traces

  3. Archive Update (sequential)
     - Check Pareto dominance
     - Add non-dominated solutions
     - Update feedback pool

Phase 3: Selection & Evolution

Pareto Archive → Instance-wise Sampling → Parent Selection → Next Generation

Data Flow Diagram

GEPA Data Flow

Key Data Structures

Population:
  • Collection of prompt transformations (not full prompts)
  • Each transformation modifies the baseline pattern
  • Stored as transformation keys, not prompt text
Pareto Archive:
  • Multi-objective optimization space
  • Objectives: accuracy, prompt_length, tool_call_rate
  • Maintains non-dominated solutions (Pareto front)
Evaluation Results:
  • Per-seed scores (accuracy)
  • Execution traces (for reflective feedback)
  • Tool call patterns

Interceptor Pattern (Critical)

GEPA Interceptor Pattern

CRITICAL: GEPA never sends optimized prompts to Task Apps.
✅ CORRECT FLOW:
Backend → Register Transformation → Interceptor → Substitute → LLM

❌ WRONG FLOW:
Backend → Send Prompt → Task App (NEVER DO THIS)
Why:
  • Task Apps remain unchanged during optimization
  • Prompts stay secure in backend
  • Pattern-based transformations enable systematic exploration
  • No prompt leakage or versioning issues

Generation Lifecycle

Timeline Example (Generation 0 → Generation 1)

Time →
Gen 0: [Mutate 10] ──→ [Eval C0..C9] ────────────────────→ [Archive]
Gen 1:                 [Mutate 10] ──→ [Eval C0..C9] ────────→ [Archive]
Gen 2:                                 [Mutate 10] ──→ [Eval] ──→ ...
Key Insight: Mutation can start before evaluation completes (pipelining).

Mutation Types

1. LLM-Guided Mutation

  • Uses mutation model (e.g., gpt-5-mini)
  • Analyzes execution traces
  • Proposes intelligent modifications

2. Regex-Based Mutation

  • Pattern substitutions
  • Systematic transformations
  • Deterministic changes

3. Crossover

  • Combines two parent prompts
  • Module-aware for multi-stage
  • Uniform crossover per module

Pareto Optimization

Multi-Objective Space:
  • X-axis: Prompt Length (tokens)
  • Y-axis: Accuracy (%)
  • Z-axis: Tool Call Rate (optional)
Pareto Front:
  • Set of non-dominated solutions
  • Each solution is optimal for some trade-off
  • Archive maintains diverse solutions

Seed Splitting Strategy

Total Seeds (e.g., 15)
├── Pareto Seeds (10) → Used for Pareto optimization
└── Feedback Seeds (5) → Used for reflective feedback
Purpose:
  • Pareto seeds: Multi-objective scoring
  • Feedback seeds: Execution trace analysis for mutation guidance

Component Interactions

Key Interactions

  1. Optimizer ↔ Population Manager
    • Creates initial population
    • Manages generations
    • Tracks transformations
  2. Optimizer ↔ Mutation Engine
    • Requests mutations
    • Provides parent transformations
    • Receives candidate variants
  3. Optimizer ↔ Evaluation Engine
    • Submits candidates for evaluation
    • Receives scores and traces
    • Manages rollout budget
  4. Evaluation Engine ↔ Task App
    • Sends rollout requests (baseline prompts only)
    • Receives trajectories
    • Computes rewards
  5. Optimizer ↔ Interceptor
    • Registers transformations
    • Interceptor substitutes at inference time
    • No direct prompt transmission

Performance Characteristics

Throughput Analysis

Bottlenecks:
  1. LLM mutation calls (~3-5s each, parallelized)
  2. Rollout evaluation (minibatch gating)
  3. Archive updates (sequential, ~100ms)
Optimization Opportunities:
  • Pipeline mutation generation
  • Parallel evaluation with minibatch control
  • Archive snapshot for pipelining

Typical Performance

  • Initial Population: 20-30 variants
  • Generations: 10-15
  • Rollouts per Generation: ~50-100
  • Total Rollouts: ~1000-1500
  • Time per Generation: ~2-5 minutes
  • Total Optimization Time: ~20-75 minutes

Configuration Parameters

Configuration reference is currently maintained in research docs. Key Parameters:
  • initial_size - Initial population size
  • num_generations - Number of evolutionary generations
  • children_per_generation - Offspring per generation
  • pareto_set_size - Size of Pareto archive
  • feedback_fraction - Fraction of seeds for feedback
  • mutation.rate - Probability of mutation vs crossover

Next Steps