Skip to main content
After a prompt optimization job completes, use the Python SDK helpers (the only supported interface) to fetch optimized prompts, inspect Pareto fronts, and re-run validation seeds against your container. GEPA supports multiple optimization modes: prompt (LLM prompts), context (retrieval/context), and optimize_anything (code, DSL configs, etc.). Result structure varies by mode; use the extraction patterns below for mode-agnostic access.

Querying Results (Python SDK)

import os
from synth_ai import SynthClient

JOB_ID = "offline_job_id_here"

client = SynthClient(api_key=os.environ["SYNTH_API_KEY"])
job = client.optimization.offline.get(job_id=JOB_ID)

print("status:", job.status())
print("events:", job.events(limit=100))
print("artifacts:", job.artifacts())

Optimization Modes

GEPA supports three optimization modes; artifact structure varies by mode:
ModeArtifact shapeTypical use
promptmessages with role/pattern/contentLLM system prompts
contextSimilar to prompt; retrieval/context configRAG, context injection
optimize_anythingdsl_config, candidate_code, solver_code, etc.Code, configs, arbitrary artifacts
Use list_candidates_typed(include="artifact_payload") for mode-agnostic access; each PolicyCandidate has candidate_content (pre-extracted text) and artifact_kind (e.g. prompt, dsl_config).

Understanding Results

Score Types

Prompt learning jobs track two types of scores:
  • prompt_best_train_score: Best accuracy on training seeds (used during optimization)
  • prompt_best_validation_score: Best accuracy on validation seeds (held-out evaluation)
The validation score provides an unbiased estimate of generalization performance.

Pareto Front

GEPA maintains a Pareto front of candidates balancing objectives (e.g. accuracy, token count, tool call rate). The structure depends on optimization mode (prompt, context, optimize_anything). Query multiple ranks via list_candidates_typed; each PolicyCandidate has candidate_content (pre-extracted text) and artifact_payload (full artifact):
# Canonical v1 currently exposes job status/events/artifacts for evaluation.
# Use `job.artifacts()` plus your verifier outputs to inspect winners.
print(job.artifacts())

Validation Evaluation

After optimization, run held-out seeds against your container using the SDK or direct HTTP calls:
import httpx

# Send rollouts directly to your container
async with httpx.AsyncClient() as client:
    response = await client.post(
        f"{container_url}/rollout",
        json={
            "trace_correlation_id": "validation-1",
            "env": {"seed": 100},  # Held-out seed
            "policy": {"config": {"prompt_template": optimized_prompt}}
        },
        headers={"X-API-Key": environment_api_key}
    )
    result = response.json()
    print(f"Reward: {result['reward_info']['outcome_reward']}")

Expected Performance

GEPA typically improves accuracy over generations:
GenerationTypical AccuracyNotes
1 (baseline)60-75%Initial random/baseline prompts
575-80%Early optimization gains
1080-85%Convergence begins
15 (final)85-90%+Optimized prompts on Pareto front

Next Steps