Skip to main content
GEPA optimizes prompts by evaluating many candidate prompts against your task app and selecting the best performers.

When to use

  • You have a task app with multiple examples
  • You want to optimize a prompt against a reward signal
  • You can deterministically map seed → example
  • You’re optimizing offline on a fixed dataset (GEPA is the offline method)

What you need

  1. A task app that implements /health, /task_info, and /rollout
  2. A dataset with deterministic seed mapping
  3. A GEPA config (TOML) with train + validation seeds

From demos: exact dataset + task app needs

From task app examples, GEPA expects:
  • task_info includes dataset split + schema and either a task list or dataset registry
  • Rollout uses seed to select the example
  • For multimodal data, inputs can be base64 data URLs and referenced by wildcards
Example task list shape (multimodal pattern):
{
  "tasks": [
    { "seed": 0, "input": { "image": "data:image/jpeg;base64,..." } },
    { "seed": 1, "input": { "image": "data:image/jpeg;base64,..." } }
  ],
  "input_schema": { "image": "string" },
  "output_schema": { "label": "string" }
}
Example prompt wildcards:
[[prompt_learning.initial_prompt.messages]]
role = "user"
pattern = "{{image}}"

[prompt_learning.initial_prompt.wildcards]
image = "REQUIRED"

Basic workflow

  1. Run your task app (local tunnel or deployed to Synth)
  2. Create a GEPA config
  3. Submit a prompt learning job
  4. Poll until complete and read the best prompt

Minimal example

import os
from synth_ai.sdk import PromptLearningJob, PromptLearningClient

client = PromptLearningClient(api_key=os.environ["SYNTH_API_KEY"])
job = await client.create_job_from_toml("gepa.toml")
await client.start_job(job["id"])
result = await client.poll_until_terminal(job["id"])
print(result["best_prompt"])

Config essentials

Your config must include:
  • prompt_learning.task_app_url
  • prompt_learning.task_app_api_key (ENVIRONMENT_API_KEY)
  • prompt_learning.initial_prompt
  • prompt_learning.gepa.evaluation.seeds
  • prompt_learning.gepa.evaluation.validation_seeds

Task app tips

  • Use seed to select examples deterministically
  • Return metrics.mean_return for each rollout
  • Route LLM calls through policy_config.inference_url

Special pattern: Daytona sandboxes (coding agents)

For coding-agent tasks (EngineBench), the task app provisions Daytona sandboxes per rollout, runs the agent inside the sandbox, executes tests, and converts pass rate to reward. GEPA mutates system prompts, AGENTS.md, and skills files. This pattern is best when:
  • Tasks require isolated repos or build environments
  • You want a reproducible, per-rollout filesystem
  • Rewards come from tests or harness scripts

Next steps

  • Task app overview: /sdk/localapi/overview
  • Dataset setup: /sdk/jobs/prompt-learning