GEPA

GEPA optimizes prompts by evaluating many candidate prompts against your task app and selecting the best performers.

When to use

You have a task app with multiple examples
You want to optimize a prompt against a reward signal
You can deterministically map seed → example
You’re optimizing offline on a fixed dataset (GEPA is the offline method)

What you need

A task app that implements /health, /task_info, and /rollout
A dataset with deterministic seed mapping
A GEPA config (TOML) with train + validation seeds

From demos: exact dataset + task app needs

From task app examples, GEPA expects:

task_info includes dataset split + schema and either a task list or dataset registry
Rollout uses seed to select the example
For multimodal data, inputs can be base64 data URLs and referenced by wildcards

Example task list shape (multimodal pattern):

{
  "tasks": [
    { "seed": 0, "input": { "image": "data:image/jpeg;base64,..." } },
    { "seed": 1, "input": { "image": "data:image/jpeg;base64,..." } }
  ],
  "input_schema": { "image": "string" },
  "output_schema": { "label": "string" }
}

Example prompt wildcards:

[[prompt_learning.initial_prompt.messages]]
role = "user"
pattern = "{{image}}"

[prompt_learning.initial_prompt.wildcards]
image = "REQUIRED"

Basic workflow

Run your task app (local tunnel or deployed to Synth)
Create a GEPA config
Submit a prompt learning job
Poll until complete and read the best prompt

Minimal example

import os
from synth_ai.sdk import PromptLearningJob, PromptLearningClient

client = PromptLearningClient(api_key=os.environ["SYNTH_API_KEY"])
job = await client.create_job_from_toml("gepa.toml")
await client.start_job(job["id"])
result = await client.poll_until_terminal(job["id"])
print(result["best_prompt"])

Config essentials

Your config must include:

prompt_learning.task_app_url
prompt_learning.task_app_api_key (ENVIRONMENT_API_KEY)
prompt_learning.initial_prompt
prompt_learning.gepa.evaluation.seeds
prompt_learning.gepa.evaluation.validation_seeds

Task app tips

Use seed to select examples deterministically
Return metrics.mean_return for each rollout
Route LLM calls through policy_config.inference_url

Special pattern: Daytona sandboxes (coding agents)

For coding-agent tasks (EngineBench), the task app provisions Daytona sandboxes per rollout, runs the agent inside the sandbox, executes tests, and converts pass rate to reward. GEPA mutates system prompts, AGENTS.md, and skills files. This pattern is best when:

Tasks require isolated repos or build environments
You want a reproducible, per-rollout filesystem
Rewards come from tests or harness scripts

Next steps

Task app overview: /sdk/localapi/overview
Dataset setup: /sdk/jobs/prompt-learning

Getting started

Algorithms

LocalAPI

Tunnel/Deploy

Datasets & Verifiers

When to use

What you need

From demos: exact dataset + task app needs

Basic workflow

Minimal example

Config essentials

Task app tips

Special pattern: Daytona sandboxes (coding agents)

Next steps

Getting started

Algorithms

LocalAPI

Tunnel/Deploy

Datasets & Verifiers

​When to use

​What you need

​From demos: exact dataset + task app needs

​Basic workflow

​Minimal example

​Config essentials

​Task app tips

​Special pattern: Daytona sandboxes (coding agents)

​Next steps

When to use

What you need

From demos: exact dataset + task app needs

Basic workflow

Minimal example

Config essentials

Task app tips

Special pattern: Daytona sandboxes (coding agents)

Next steps