Online Serving Patterns

Online prompt optimization (GEPA and MIPRO) supports two ways to serve the best prompt at inference time:

Pattern	How it works	When to use
A: Proxy-based	Runtime calls a Synth proxy URL; Synth performs live candidate selection and injects the prompt	Simplest integration; you want Synth to own selection
B: Retrieval + JIT apply	Runtime fetches candidates via APIs, picks the best, and applies it just-in-time before each request	You control selection logic; you call the LLM directly

Both patterns work with GEPA and MIPRO online sessions. The backend proposes new candidates and tracks rewards the same way; only the serving path differs.

Pattern A: Proxy-based serving

Your rollout loop sends LLM requests to a Synth proxy URL. The proxy:

Selects the current best candidate (or assigns one for exploration)
Injects the candidate prompt as the system message
Forwards the request to your configured LLM provider
Returns the response with headers (x-gepa-rollout-id, x-gepa-candidate-id) for reward attribution

Flow:

Your app → POST {proxy_url}/chat/completions → Synth proxy → LLM provider → response
                ↓
         Synth injects prompt, selects candidate

SDK usage:

session = client.optimization.online.create(kind="gepa_online", config_path="gepa.toml")
urls = session.get_prompt_urls()
# Call urls["chat_completions_url"] for each LLM request

See GEPA Online Banking77 and GEPA Online API for full details.

Pattern B: Retrieval + JIT apply

Your rollout loop fetches candidates from Synth, picks the best (or your own logic), and applies the prompt yourself before calling the LLM directly. Flow:

Create an online session (same as Pattern A)
Poll session state for best_candidate_id (or list candidates and choose)
Fetch the candidate payload via session.get_candidate(candidate_id) or session.list_candidates()
Extract prompt text from the candidate (candidate_content, artifact_payload, or nested fields)
Call your LLM with that prompt as the system message
Report rewards via session.update_reward(...) with candidate_id and rollout_id

When to use Pattern B:

You call the LLM directly (no proxy in the path)
You want custom selection logic (e.g., A/B by user segment, fallback rules)
You need to cache or preload candidates
Your infra cannot route through a Synth proxy

Retrieval APIs

List candidates for a session

GET /api/v1/systems/{system_id}/candidates For online sessions, system_id is the session_id returned when you create the session. Query params: job_id, algorithm, mode, status, limit, cursor, sort, include. Response:

{
  "items": [
    {
      "candidate_id": "cand_abc123",
      "candidate_content": "You are a helpful assistant...",
      "avg_reward": 0.85,
      "rollout_count": 42
    }
  ],
  "next_cursor": "..."
}

Get a single candidate

GET /api/v1/candidates/{candidate_id}
GET /api/v1/offline/jobs/{job_id}/candidates/{candidate_id} Returns the full candidate payload including candidate_content, artifact_payload, or nested prompt structures.

Session state (best candidate)

GET /api/v1/online/sessions/{session_id} Returns live state including:

best_candidate_id — the backend’s current best
best_reward / best_objective_value — associated score
candidates — summary list with candidate_id, avg_reward, rollout_count

Use best_candidate_id to fetch the prompt via get_candidate() when using Pattern B.

SDK usage (Pattern B)

GEPA online

from synth_ai import SynthClient

client = SynthClient(api_key=os.environ["SYNTH_API_KEY"])
session = client.optimization.online.create(
    kind="gepa_online",
    config_path="gepa.toml",
)

# Option 1: Use best from session state
state = session.get_status()
best_id = state.get("best_candidate_id") or "baseline"
candidate = session.get_candidate(best_id)
prompt_text = candidate.get("candidate_content") or candidate.get("artifact_payload")

# Option 2: List and pick (e.g., by your own logic)
page = session.list_candidates(limit=10)
items = page.get("items", [])
best = max(items, key=lambda x: x.get("avg_reward", 0))
candidate = session.get_candidate(best["candidate_id"])

# Apply prompt and call your LLM
messages = [{"role": "system", "content": prompt_text}, {"role": "user", "content": user_input}]
response = your_llm_client.chat(messages)

# Report reward (use rollout_id from your own tracking)
session.update_reward(
    reward_info={"score": 0.9},
    rollout_id="my_rollout_123",
    candidate_id=best_id,
)

MIPRO online

MiproOnlineSession exposes the same retrieval methods: list_candidates(), list_candidates_async(), get_candidate(), get_candidate_async(). Use session_id as the system identifier when calling the REST APIs directly.

Reward attribution

For Pattern B, you must track rollout_id and candidate_id yourself and pass them to update_reward() so the backend can attribute rewards correctly. The backend uses this to update per-candidate statistics and propose new candidates.

Next steps

GEPA Online API — session lifecycle, proxy behavior, reward format
GEPA Online Banking77 — proxy-based walkthrough
Continual Learning (MIPRO) — MIPRO online overview

Prompt Optimization

Container

Tunnel/Deploy

Horizons

Runtime APIs

API Reference

Cookbooks

Online Serving Patterns

Pattern A: Proxy-based serving

Pattern B: Retrieval + JIT apply

Retrieval APIs

List candidates for a session

Get a single candidate

Session state (best candidate)

SDK usage (Pattern B)

GEPA online

MIPRO online

Reward attribution

Next steps

Prompt Optimization

Container

Tunnel/Deploy

Horizons

Runtime APIs

API Reference

Cookbooks

​Pattern A: Proxy-based serving

​Pattern B: Retrieval + JIT apply

​Retrieval APIs

​List candidates for a session

​Get a single candidate

​Session state (best candidate)

​SDK usage (Pattern B)

​GEPA online

​MIPRO online

​Reward attribution

​Next steps

Pattern A: Proxy-based serving

Pattern B: Retrieval + JIT apply

Retrieval APIs

List candidates for a session

Get a single candidate

Session state (best candidate)

SDK usage (Pattern B)

GEPA online

MIPRO online

Reward attribution

Next steps