Prompt Learning Jobs

Prompt optimization using GEPA.

What GEPA needs from your task app

GEPA only works if the task app exposes the dataset and schema through /task_info, and routes each rollout to the correct example based on seed.

Required data in `task_info`

Your task app must return:

Dataset identifiers and available splits
Input/output schema (field names and types)
Any metadata needed to interpret examples
A stable mapping between seed and examples

Minimal example:

{
  "id": "banking77",
  "splits": ["train", "validation"],
  "input_schema": {"text": "string"},
  "output_schema": {"label": "string"}
}

Include actual tasks or a dataset registry

From task app examples, GEPA works best when the task app either:

Returns an explicit task list in task_info (each task includes seed + inputs), or
Exposes a dataset registry and uses seed to sample deterministically at rollout time.

If you return explicit tasks, the expected shape looks like:

{
  "tasks": [
    { "seed": 0, "input": { "text": "..." } },
    { "seed": 1, "input": { "text": "..." } }
  ]
}

Seed routing (how GEPA selects examples)

Each rollout includes a seed. The task app must map that seed to a deterministic example from the dataset split used for the job (train/validation/test).

Train vs validation seeds

GEPA expects you to provide train/validation seeds in the job config, and the task app must honor them consistently:

Train seeds drive candidate scoring
Validation seeds drive Pareto selection

If your task app uses multiple splits, use env_name or split metadata to ensure seeds map to the correct split.

Rollout input contract (data path)

GEPA sends:

{
  "seed": 42,
  "inputs": { "text": "..." },
  "policy_config": { "inference_url": "...", "model": "..." }
}

Your task app must:

Load the example corresponding to seed
Run the prompt against the example
Return metrics.mean_return as the reward

Multimodal datasets (images, files)

If your dataset includes images or files, include them in inputs or in the task list with stable placeholders (e.g. {{image}}) and resolve them at rollout time. Use base64-encoded data URLs for reproducibility when needed.

Dataset prep checklist

Deterministic seed -> example mapping
Clear split boundaries (train/validation/test)
input_schema/output_schema aligned with your prompt format
Task app returns metrics.mean_return for every rollout

SDK

import os
from synth_ai.sdk import PromptLearningJob

async def run_optimization():
    client = PromptLearningClient(api_key=os.environ["SYNTH_API_KEY"])

    # Create and start job from TOML
    job = await client.create_job_from_toml("gepa.toml")
    await client.start_job(job["id"])

    # Poll until complete
    result = await client.poll_until_terminal(job["id"])
    print(f"Best prompt: {result['best_prompt']}")
    print(f"Best score: {result['best_score']}")

Requires a LocalAPI to evaluate prompts.

Pattern Discovery

Use pattern discovery to derive prompt_learning.initial_prompt directly from traces.

Flow A (recommended): eval -> discover -> configure

Run an eval job with the same task app and policy settings you will use for prompt learning.

Discover patterns from the eval traces:

from synth_ai.sdk import PromptLearningJob

client = PromptLearningClient(api_key=os.environ["SYNTH_API_KEY"])
patterns = await client.discover_patterns(job_id="eval_XXXX")
print(patterns.to_toml())  # Copy to your config

Copy the emitted TOML snippet into your prompt learning config under prompt_learning.initial_prompt.

Flow B (experimental): auto-discover at runtime

Omit prompt_learning.initial_prompt and set:

[prompt_learning]
auto_discover_patterns = true

Warning: Auto-discovery is experimental. It runs a validation rollout, infers patterns from traces, and proceeds with optimization. Noisy or highly multi-call traces may still fail.

Interpreting discovery output

Patterns are ranked by support_count and match_rate.
warnings includes normalization or multi-pattern ambiguity notes.
If multiple patterns are returned, prefer the one with the highest support and match rate.

Config Reference

Configuration references for GEPA are maintained in research-only docs.

Getting started

Algorithms

LocalAPI

Tunnel/Deploy

Datasets & Verifiers

Prompt Learning Jobs

What GEPA needs from your task app

Required data in `task_info`

Include actual tasks or a dataset registry

Seed routing (how GEPA selects examples)

Train vs validation seeds

Rollout input contract (data path)

Multimodal datasets (images, files)

Dataset prep checklist

SDK

Pattern Discovery

Flow A (recommended): eval -> discover -> configure

Flow B (experimental): auto-discover at runtime

Interpreting discovery output

Config Reference

Getting started

Algorithms

LocalAPI

Tunnel/Deploy

Datasets & Verifiers

​What GEPA needs from your task app

​Required data in task_info

​Include actual tasks or a dataset registry

​Seed routing (how GEPA selects examples)

​Train vs validation seeds

​Rollout input contract (data path)

​Multimodal datasets (images, files)

​Dataset prep checklist

​SDK

​Pattern Discovery

​Flow A (recommended): eval -> discover -> configure

​Flow B (experimental): auto-discover at runtime

​Interpreting discovery output

​Config Reference

What GEPA needs from your task app

Required data in `task_info`

Include actual tasks or a dataset registry

Seed routing (how GEPA selects examples)

Train vs validation seeds

Rollout input contract (data path)

Multimodal datasets (images, files)

Dataset prep checklist

SDK

Pattern Discovery

Flow A (recommended): eval -> discover -> configure

Flow B (experimental): auto-discover at runtime

Interpreting discovery output

Config Reference