When to use
- You have a task app with multiple examples
- You want to optimize a prompt against a reward signal
- You can deterministically map
seed→ example - You’re optimizing offline on a fixed dataset (GEPA is the offline method)
What you need
- A task app that implements
/health,/task_info, and/rollout - A dataset with deterministic seed mapping
- A GEPA config (TOML) with train + validation seeds
From demos: exact dataset + task app needs
From task app examples, GEPA expects:task_infoincludes dataset split + schema and either a task list or dataset registry- Rollout uses
seedto select the example - For multimodal data, inputs can be base64 data URLs and referenced by wildcards
Basic workflow
- Run your task app (local tunnel or deployed to Synth)
- Create a GEPA config
- Submit a prompt learning job
- Poll until complete and read the best prompt
Minimal example
Config essentials
Your config must include:prompt_learning.task_app_urlprompt_learning.task_app_api_key(ENVIRONMENT_API_KEY)prompt_learning.initial_promptprompt_learning.gepa.evaluation.seedsprompt_learning.gepa.evaluation.validation_seeds
Task app tips
- Use
seedto select examples deterministically - Return
metrics.mean_returnfor each rollout - Route LLM calls through
policy_config.inference_url
Special pattern: Daytona sandboxes (coding agents)
For coding-agent tasks (EngineBench), the task app provisions Daytona sandboxes per rollout, runs the agent inside the sandbox, executes tests, and converts pass rate to reward. GEPA mutates system prompts,AGENTS.md, and skills files.
This pattern is best when:
- Tasks require isolated repos or build environments
- You want a reproducible, per-rollout filesystem
- Rewards come from tests or harness scripts
Next steps
- Task app overview:
/sdk/localapi/overview - Dataset setup:
/sdk/jobs/prompt-learning