1. Build a prompt evaluation LocalAPI
Use the TaskAppConfig interface to describe dataset splits, rubrics, and rollout handlers. Build in any language – implement the OpenAPI contract in your preferred language. → Create a prompt evaluation LocalAPI | Polyglot examples2. Author the prompt optimization config
Capture the GEPA algorithm choice, initial prompt template, training/validation seeds, and optimization parameters in TOML. → Read: Prompt optimization configs3. Query and evaluate results
Use the Python API or REST endpoints to retrieve optimized prompts and evaluate them on held-out validation sets.→ Read: Querying results
Algorithm Overview
GEPA (Genetic Evolution of Prompt Architectures)
Best for: Broad exploration, diverse prompt variants, classification tasksReference: Agrawal et al. (2025) GEPA uses evolutionary principles to explore the prompt space:
- Population-based search with multiple prompt variants
- LLM-guided mutations for intelligent prompt modifications
- Pareto optimization balancing performance and prompt length
- Multi-stage support for pipeline optimization
- Maintains a Pareto front of non-dominated solutions
- Supports both template mode and pattern-based transformations
- Module-aware evolution for multi-stage pipelines
- Reflective feedback from execution traces
- Hosted verifier integration for quality-aware optimization
Architecture: Inference Interception
GEPA does call your task app’s/rollout endpoint — but optimized prompts never appear in the rollout payload. Instead, the backend registers each candidate with an inference interceptor and passes your task app a policy_config.inference_url. When your task app makes LLM calls through that URL, the interceptor substitutes the candidate prompt before forwarding to the model.
- No prompt leakage: your task app never sees the optimized prompt text
- Task apps remain unchanged: just route LLM calls through
policy_config.inference_url - Traces captured: the interceptor records execution traces for reflective feedback
- Stored artifacts: traces and artifacts can be reused for reflection across generations
Production-Ready: Works with Your Code
GEPA works with your production code via HTTP-based serverless endpoints. Build LocalAPI in any language (Rust, Go, TypeScript, Zig, Python, or any language that can serve HTTP). See Polyglot LocalAPI for examples and the OpenAPI contract.Supported Models
See Supported Models for Prompt Optimization for the full list of policy models.Multi-Stage Pipeline Support
GEPA supports optimizing prompts for multi-stage pipelines (e.g., Banking77 classifier → calibrator):- LCS-based stage detection automatically identifies which stage is being called
- Per-stage optimization evolves separate instructions for each pipeline module
- Unified evaluation tracks end-to-end performance across all stages
Next Steps
- GEPA Algorithm Details – How GEPA works under the hood
- System Specifications – How specs guide optimization
- Configuration Reference – Complete parameter documentation
- Training Guide – Step-by-step training instructions
- Prompt Optimization Cookbook – Complete walkthrough