What GEPA needs from your task app
GEPA only works if the task app exposes the dataset and schema through/task_info, and routes each rollout to the correct example based on seed.
Required data in task_info
Your task app must return:
- Dataset identifiers and available splits
- Input/output schema (field names and types)
- Any metadata needed to interpret examples
- A stable mapping between
seedand examples
Include actual tasks or a dataset registry
From task app examples, GEPA works best when the task app either:- Returns an explicit task list in
task_info(each task includesseed+ inputs), or - Exposes a dataset registry and uses
seedto sample deterministically at rollout time.
Seed routing (how GEPA selects examples)
Each rollout includes aseed. The task app must map that seed to a deterministic
example from the dataset split used for the job (train/validation/test).
Train vs validation seeds
GEPA expects you to provide train/validation seeds in the job config, and the task app must honor them consistently:- Train seeds drive candidate scoring
- Validation seeds drive Pareto selection
env_name or split metadata to ensure
seeds map to the correct split.
Rollout input contract (data path)
GEPA sends:- Load the example corresponding to
seed - Run the prompt against the example
- Return
metrics.mean_returnas the reward
Multimodal datasets (images, files)
If your dataset includes images or files, include them ininputs or in the task
list with stable placeholders (e.g. {{image}}) and resolve them at rollout
time. Use base64-encoded data URLs for reproducibility when needed.
Dataset prep checklist
- Deterministic
seed -> examplemapping - Clear split boundaries (train/validation/test)
input_schema/output_schemaaligned with your prompt format- Task app returns
metrics.mean_returnfor every rollout
SDK
Pattern Discovery
Use pattern discovery to deriveprompt_learning.initial_prompt directly from traces.
Flow A (recommended): eval -> discover -> configure
- Run an eval job with the same task app and policy settings you will use for prompt learning.
- Discover patterns from the eval traces:
- Copy the emitted TOML snippet into your prompt learning config under
prompt_learning.initial_prompt.
Flow B (experimental): auto-discover at runtime
Omitprompt_learning.initial_prompt and set:
Warning: Auto-discovery is experimental. It runs a validation rollout, infers patterns from traces, and proceeds with optimization. Noisy or highly multi-call traces may still fail.
Interpreting discovery output
- Patterns are ranked by
support_countandmatch_rate. warningsincludes normalization or multi-pattern ambiguity notes.- If multiple patterns are returned, prefer the one with the highest support and match rate.