Skip to main content
Prompt optimization using GEPA.

What GEPA needs from your task app

GEPA only works if the task app exposes the dataset and schema through /task_info, and routes each rollout to the correct example based on seed.

Required data in task_info

Your task app must return:
  • Dataset identifiers and available splits
  • Input/output schema (field names and types)
  • Any metadata needed to interpret examples
  • A stable mapping between seed and examples
Minimal example:
{
  "id": "banking77",
  "splits": ["train", "validation"],
  "input_schema": {"text": "string"},
  "output_schema": {"label": "string"}
}

Include actual tasks or a dataset registry

From task app examples, GEPA works best when the task app either:
  • Returns an explicit task list in task_info (each task includes seed + inputs), or
  • Exposes a dataset registry and uses seed to sample deterministically at rollout time.
If you return explicit tasks, the expected shape looks like:
{
  "tasks": [
    { "seed": 0, "input": { "text": "..." } },
    { "seed": 1, "input": { "text": "..." } }
  ]
}

Seed routing (how GEPA selects examples)

Each rollout includes a seed. The task app must map that seed to a deterministic example from the dataset split used for the job (train/validation/test).

Train vs validation seeds

GEPA expects you to provide train/validation seeds in the job config, and the task app must honor them consistently:
  • Train seeds drive candidate scoring
  • Validation seeds drive Pareto selection
If your task app uses multiple splits, use env_name or split metadata to ensure seeds map to the correct split.

Rollout input contract (data path)

GEPA sends:
{
  "seed": 42,
  "inputs": { "text": "..." },
  "policy_config": { "inference_url": "...", "model": "..." }
}
Your task app must:
  1. Load the example corresponding to seed
  2. Run the prompt against the example
  3. Return metrics.mean_return as the reward

Multimodal datasets (images, files)

If your dataset includes images or files, include them in inputs or in the task list with stable placeholders (e.g. {{image}}) and resolve them at rollout time. Use base64-encoded data URLs for reproducibility when needed.

Dataset prep checklist

  • Deterministic seed -> example mapping
  • Clear split boundaries (train/validation/test)
  • input_schema/output_schema aligned with your prompt format
  • Task app returns metrics.mean_return for every rollout

SDK

import os
from synth_ai.sdk import PromptLearningJob

async def run_optimization():
    client = PromptLearningClient(api_key=os.environ["SYNTH_API_KEY"])

    # Create and start job from TOML
    job = await client.create_job_from_toml("gepa.toml")
    await client.start_job(job["id"])

    # Poll until complete
    result = await client.poll_until_terminal(job["id"])
    print(f"Best prompt: {result['best_prompt']}")
    print(f"Best score: {result['best_score']}")
Requires a LocalAPI to evaluate prompts.

Pattern Discovery

Use pattern discovery to derive prompt_learning.initial_prompt directly from traces.
  1. Run an eval job with the same task app and policy settings you will use for prompt learning.
  2. Discover patterns from the eval traces:
    from synth_ai.sdk import PromptLearningJob
    
    client = PromptLearningClient(api_key=os.environ["SYNTH_API_KEY"])
    patterns = await client.discover_patterns(job_id="eval_XXXX")
    print(patterns.to_toml())  # Copy to your config
    
  3. Copy the emitted TOML snippet into your prompt learning config under prompt_learning.initial_prompt.

Flow B (experimental): auto-discover at runtime

Omit prompt_learning.initial_prompt and set:
[prompt_learning]
auto_discover_patterns = true
Warning: Auto-discovery is experimental. It runs a validation rollout, infers patterns from traces, and proceeds with optimization. Noisy or highly multi-call traces may still fail.

Interpreting discovery output

  • Patterns are ranked by support_count and match_rate.
  • warnings includes normalization or multi-pattern ambiguity notes.
  • If multiple patterns are returned, prefer the one with the highest support and match rate.

Config Reference

Configuration references for GEPA are maintained in research-only docs.