Skip to main content
MIPRO optimizes prompts by proposing instruction variants and evaluating them against container traces and rewards.

At a glance

MIPRO continual learning loop (online): your rollout loop calls the proxy URL; the proxy substitutes the latest prompt+context policy; you report rewards via update_reward; the learner updates the policy.

When to use

  • You want iterative prompt refinement with proposal stages
  • You have reliable traces and a stable container
  • You’re optimizing online from production data (MIPRO is the online method)

What you need

  1. A container with multiple examples
  2. Traces from eval or prior rollouts
  3. A MIPRO online job (backend returns a proxy URL)

Basic workflow

  1. Run eval to collect traces
  2. Start a MIPRO online job
  3. Evaluate candidates without leaking prompts to the container

From demos: exact online loop (Banking77)

From the mipro_banking77 demo, the online loop looks like:
  1. Start a local container and health-check it
  2. Create a MIPRO online job on the backend
  3. Receive a proxy URL for prompt substitution
  4. For each rollout, call {proxy_url}/{rollout_id}/chat/completions
  5. Compute reward locally and POST status updates:
    • status=reward with reward value
    • status=done when rollout finishes
The demo uses:
  • --train-size and --val-size to define seed ranges
  • --min-proposal-rollouts to control when new proposals appear

Minimal example

import os
from synth_ai import SynthClient

client = SynthClient(api_key=os.environ["SYNTH_API_KEY"])
session = client.optimization.online.create(
    kind="mipro_online",
    config_path="mipro.toml",
)

urls = session.get_prompt_urls()
print(urls["online_url"])

# In your rollout loop, call the proxy URL, compute reward, then report it:
session.update_reward(
    reward_info={"score": 0.85},
    rollout_id="rollout_123",
)

Key constraints

  • MIPRO must not leak prompts to the container
  • Use the interceptor for all LLM calls
  • Keep rewards and traces consistent across rollouts (stable container + stable scoring)

Reward inputs & verifiers

Continual learning ingests every reward through the same update_reward endpoint, so you can mix rollout-based scorers with any upstream signal that can post a status=reward update. That flexibility means the MIPRO job can learn from:
  • Verifier-based rewards. RLM v1/v2 verifiers (or any verifier that makes the same API call) can score completions and send the reward payload just like a container rollout, so the learner treats verifier scores the same as a trace-derived reward.
  • External signal sources. User feedback, automated checks, monitoring bots, or other workflows can push reward updates once they compute a signal from user behavior, QA results, or heuristics.
  • Mixed signal/verifier setups. Nothing stops you from blending the above: run container rollouts for prompt diversity, simultaneously feed verifier grades, and layer in automated signal updates; the learner merely consumes whatever reward metadata arrives.
Make sure the workflow that computes the signal can identify the rollout_id and call session.update_reward (or the equivalent REST endpoint) with the score and status values (e.g., reward, done) that the backend expects. From the learner’s perspective it is agnostic to whether the reward originated from the container, a verifier, or another signal.

Tradeoff vs GEPA

  • MIPRO is less efficient than GEPA but can learn continuously from production rollouts and fresh data.

Next steps

  • Container overview: /sdk/container/overview
  • Traces and rubrics: /sdk/tracing/v3-traces and /sdk/tracing/rubrics
  • MIPRO SDK reference: /sdk/jobs/prompt-optimization/mipro