Skip to main content

Overview

GEPA (Genetic Evolution of Prompt Architectures) is Synth AI’s core prompt optimization algorithm. It uses population-based evolutionary search to improve prompts through guided mutations, crossover, and multi-objective selection.

References

  • GEPA: Agrawal et al. (2025). “GEPA: Reflective Prompt Evolution.” arXiv:2507.19457

GEPA (Genetic Evolution of Prompt Architectures)

GEPA outperforms GRPO by 10% on average and by up to 20%, while using up to 35x fewer rollouts.

How It Works

GEPA uses evolutionary principles inspired by genetic algorithms:
  1. Population Initialization
    • Starts with baseline prompt + random mutations
    • Creates initial population of 20-30 prompt variants
  2. Evaluation
    • Evaluates each prompt variant on training seeds
    • Tracks multiple objectives: accuracy, token count, tool call rate
  3. Selection (Pareto Front)
    • Maintains non-dominated solutions
    • Balances performance vs. prompt length
    • Keeps top-K solutions in Pareto archive
  4. Variation
    • Mutation: LLM-guided or regex-based prompt modifications
    • Crossover: Combines two parent prompts to create offspring
  5. Evolution Loop
    • Repeats for 10-15 generations
    • Population evolves toward better solutions

Key Features

  • Pareto Optimization: Maintains diverse solutions balancing multiple objectives
  • LLM-Guided Mutations: Uses mutation models (e.g., gpt-oss-120b) for intelligent modifications
  • Pattern Mode: Supports transformation-based mutations for systematic changes
  • Multi-Stage Support: Module-aware evolution for pipeline optimization
  • Reflective Feedback: Analyzes execution traces to guide mutations
  • Hosted Verifier Integration: Optional verifier-based evaluation for quality-aware optimization

Typical Results

  • Baseline: 60-75% accuracy
  • After 5 generations: 75-80% accuracy
  • After 10 generations: 80-85% accuracy
  • After 15 generations: 85-90%+ accuracy

Best For

  • Classification tasks (Banking77, intent classification)
  • Multi-hop QA (HotpotQA)
  • Tasks requiring diverse prompt variants
  • Large evaluation budgets (1000+ rollouts)

Architecture: Inference Interception

GEPA uses the interceptor pattern: Key Benefits:
  • Task apps remain unchanged during optimization
  • Prompt optimization logic stays in backend
  • Secure, correct prompt substitution
  • No prompt leakage to task apps

Supported Models

See Supported Models for Prompt Optimization for the full list of policy models.

Multi-Stage Pipeline Support

GEPA supports optimizing prompts for multi-stage pipelines:

GEPA Multi-Stage

  • Module-aware evolution: Each pipeline module gets its own gene
  • Module selection: Mutations target specific modules
  • Uniform crossover: Combines parent genes per module
  • Aggregated scoring: Sum of module lengths for Pareto optimization

Next Steps