Querying Results (Python SDK)
Optimization Modes
GEPA supports three optimization modes; artifact structure varies by mode:| Mode | Artifact shape | Typical use |
|---|---|---|
| prompt | messages with role/pattern/content | LLM system prompts |
| context | Similar to prompt; retrieval/context config | RAG, context injection |
| optimize_anything | dsl_config, candidate_code, solver_code, etc. | Code, configs, arbitrary artifacts |
list_candidates_typed(include="artifact_payload") for mode-agnostic access; each PolicyCandidate has candidate_content (pre-extracted text) and artifact_kind (e.g. prompt, dsl_config).
Understanding Results
Score Types
Prompt learning jobs track two types of scores:prompt_best_train_score: Best accuracy on training seeds (used during optimization)prompt_best_validation_score: Best accuracy on validation seeds (held-out evaluation)
Pareto Front
GEPA maintains a Pareto front of candidates balancing objectives (e.g. accuracy, token count, tool call rate). The structure depends on optimization mode (prompt, context, optimize_anything). Query multiple ranks vialist_candidates_typed; each PolicyCandidate has candidate_content (pre-extracted text) and artifact_payload (full artifact):
Validation Evaluation
After optimization, run held-out seeds against your container using the SDK or direct HTTP calls:Expected Performance
GEPA typically improves accuracy over generations:| Generation | Typical Accuracy | Notes |
|---|---|---|
| 1 (baseline) | 60-75% | Initial random/baseline prompts |
| 5 | 75-80% | Early optimization gains |
| 10 | 80-85% | Convergence begins |
| 15 (final) | 85-90%+ | Optimized prompts on Pareto front |
Next Steps
- Prompt Optimization Cookbook – Complete GEPA walkthrough
- Configuration Reference – All algorithm parameters