1. Build a prompt evaluation Container
Use the ContainerConfig interface to describe dataset splits, rubrics, and rollout handlers. Build in any language – implement the OpenAPI contract in your preferred language. → Create a prompt evaluation Container | Polyglot examples2. Author the prompt optimization config
Capture the GEPA algorithm choice, initial prompt template, training/validation seeds, and optimization parameters in TOML. → Read: Prompt optimization configs3. Query and evaluate results
Use the Python API or REST endpoints to retrieve optimized prompts and evaluate them on held-out validation sets.→ Read: Querying results
Algorithm Overview
GEPA (Genetic Evolution of Prompt Architectures)
Best for: Broad exploration, diverse prompt variants, classification tasksReference: Agrawal et al. (2025) GEPA uses evolutionary principles to explore the prompt space:
- Population-based search with multiple prompt variants
- LLM-guided mutations for intelligent prompt modifications
- Pareto optimization balancing performance and prompt length
- Multi-stage support for pipeline optimization
- Maintains a Pareto front of non-dominated solutions
- Supports both template mode and pattern-based transformations
- Module-aware evolution for multi-stage pipelines
- Reflective feedback from execution traces
- Hosted verifier integration for quality-aware optimization
Architecture: Inference Interception
GEPA does call your container’s/rollout endpoint — but optimized prompts never appear in the rollout payload. Instead, the backend registers each candidate with an inference interceptor and passes your container a policy_config.inference_url. When your container makes LLM calls through that URL, the interceptor substitutes the candidate prompt before forwarding to the model.
- No prompt leakage: your container never sees the optimized prompt text
- Containers remain unchanged: just route LLM calls through
policy_config.inference_url - Traces captured: the interceptor records execution traces for reflective feedback
- Stored artifacts: traces and artifacts can be reused for reflection across generations
Production-Ready: Works with Your Code
GEPA works with your production code via HTTP-based serverless endpoints. Build Container in any language (Rust, Go, TypeScript, Zig, Python, or any language that can serve HTTP). See Polyglot Container for examples and the OpenAPI contract.Supported Models
See Supported Models for Prompt Optimization for the full list of policy models.Multi-Stage Pipeline Support
GEPA supports optimizing prompts for multi-stage pipelines (e.g., Banking77 classifier → calibrator):- LCS-based stage detection automatically identifies which stage is being called
- Per-stage optimization evolves separate instructions for each pipeline module
- Unified evaluation tracks end-to-end performance across all stages
Next Steps
- GEPA Algorithm Details – How GEPA works under the hood
- System Specifications – How specs guide optimization
- Configuration Reference – Complete parameter documentation
- Training Guide – Step-by-step training instructions
- Prompt Optimization Cookbook – Complete walkthrough