Overview - Synth AI

Prompt optimization uses evolutionary algorithms to automatically improve prompts for classification, reasoning, and instruction-following tasks. Works with any language – build LocalAPI in Rust, Go, TypeScript, Zig, Python, or any language that can serve HTTP. See Polyglot LocalAPI for examples and the OpenAPI contract. Synth AI uses GEPA: Agrawal et al. (2025). “GEPA: Reflective Prompt Evolution.” arXiv:2507.19457

1. Build a prompt evaluation LocalAPI

Use the TaskAppConfig interface to describe dataset splits, rubrics, and rollout handlers. Build in any language – implement the OpenAPI contract in your preferred language. → Create a prompt evaluation LocalAPI | Polyglot examples

2. Author the prompt optimization config

Capture the GEPA algorithm choice, initial prompt template, training/validation seeds, and optimization parameters in TOML. → Read: Prompt optimization configs

3. Query and evaluate results

Use the Python API or REST endpoints to retrieve optimized prompts and evaluate them on held-out validation sets.
→ Read: Querying results

Algorithm Overview

GEPA (Genetic Evolution of Prompt Architectures)

Best for: Broad exploration, diverse prompt variants, classification tasks
Reference: Agrawal et al. (2025) GEPA uses evolutionary principles to explore the prompt space:

Population-based search with multiple prompt variants
LLM-guided mutations for intelligent prompt modifications
Pareto optimization balancing performance and prompt length
Multi-stage support for pipeline optimization

Typical results: Improves accuracy from 60-75% (baseline) to 85-90%+ over 15 generations Key features:

Maintains a Pareto front of non-dominated solutions
Supports both template mode and pattern-based transformations
Module-aware evolution for multi-stage pipelines
Reflective feedback from execution traces
Hosted verifier integration for quality-aware optimization

Architecture: Inference Interception

GEPA does call your task app’s /rollout endpoint — but optimized prompts never appear in the rollout payload. Instead, the backend registers each candidate with an inference interceptor and passes your task app a policy_config.inference_url. When your task app makes LLM calls through that URL, the interceptor substitutes the candidate prompt before forwarding to the model.

GEPA evaluation flow:

Backend ──proposes candidate──▶ Interceptor (registers prompt)
Backend ──/rollout──▶ Task App
Task App ──LLM call via inference_url──▶ Interceptor ──substitutes prompt──▶ LLM
Task App ◀──response──────────────────── LLM
Backend  ◀──metrics/reward────────────── Task App

This separation ensures:

No prompt leakage: your task app never sees the optimized prompt text
Task apps remain unchanged: just route LLM calls through policy_config.inference_url
Traces captured: the interceptor records execution traces for reflective feedback
Stored artifacts: traces and artifacts can be reused for reflection across generations

Production-Ready: Works with Your Code

GEPA works with your production code via HTTP-based serverless endpoints. Build LocalAPI in any language (Rust, Go, TypeScript, Zig, Python, or any language that can serve HTTP). See Polyglot LocalAPI for examples and the OpenAPI contract.

Supported Models

See Supported Models for Prompt Optimization for the full list of policy models.

Multi-Stage Pipeline Support

GEPA supports optimizing prompts for multi-stage pipelines (e.g., Banking77 classifier → calibrator):

LCS-based stage detection automatically identifies which stage is being called
Per-stage optimization evolves separate instructions for each pipeline module
Unified evaluation tracks end-to-end performance across all stages

Next Steps

GEPA Algorithm Details – How GEPA works under the hood
System Specifications – How specs guide optimization
Configuration Reference – Complete parameter documentation
Training Guide – Step-by-step training instructions
Prompt Optimization Cookbook – Complete walkthrough

​1. Build a prompt evaluation LocalAPI

​2. Author the prompt optimization config

​3. Query and evaluate results

​Algorithm Overview

​GEPA (Genetic Evolution of Prompt Architectures)

​Architecture: Inference Interception

​Production-Ready: Works with Your Code

​Supported Models

​Multi-Stage Pipeline Support

​Next Steps