Skip to main content
Note: This page is auto-generated from SDK validation code. Parameters and types are extracted automatically and will update when the code changes.

MIPRO Online (Multi-prompt Instruction Proposal Optimizer) is an algorithm for optimizing prompts through systematic instruction proposal and evaluation in online mode, where you drive rollouts locally while the backend provides prompt candidates through proxy URLs. Endpoint: POST /api/policy-optimization/online/jobs Authentication: Bearer token via Authorization: Bearer $SYNTH_API_KEY

Overview

In online mode:
  • You control rollouts: Drive the rollout loop locally
  • No tunneling required: Backend never calls your task app
  • Real-time evolution: Prompts evolve as rewards are reported
  • Proxy URL: Backend provides a proxy URL that selects prompt candidates for each LLM call

Request

{
  "policy_optimization": {
    "algorithm": "mipro",
    "task_app_url": "https://your-task-app.example.com",
    "policy": {
      "model": "gpt-4o-mini",
      "provider": "openai",
      "temperature": 0.0,
      "max_completion_tokens": 256
    },
    "mipro": {
      "mode": "online",
      "bootstrap_train_seeds": [0, 1, 2, 3, 4],
      "val_seeds": [100, 101, 102],
      "online_pool": [0, 1, 2, 3, 4],
      "online_proposer_mode": "inline",
      "online_proposer_min_rollouts": 20,
      "online_rollouts_per_candidate": 10,
      "proposer": {
        "mode": "instruction_only",
        "model": "gpt-4o-mini",
        "provider": "openai",
        "temperature": 0.7,
        "max_tokens": 512
      }
    }
  }
}

Parameters

ParameterTypeRequiredDescription
mipro.modestringYesMust be "online"
mipro.bootstrap_train_seedsarray[int]YesInitial training seeds for bootstrap phase
mipro.val_seedsarray[int]YesValidation seeds for evaluation
mipro.online_poolarray[int]YesPool of seeds for online optimization
mipro.online_proposer_min_rolloutsintYesMinimum rollouts before generating new proposals
mipro.online_proposer_modestringYesProposer mode: ‘inline’ (proposals generated during optimization)
mipro.online_rollouts_per_candidateintYesNumber of rollouts per candidate before switching
mipro.proposerobjectYesProposer configuration for generating prompt proposals
mipro.proposer.max_tokensintNoMaximum tokens for proposer output (default: 512)
mipro.proposer.modestringYesProposer generation mode: ‘instruction_only’
mipro.proposer.modelstringYesModel for generating proposals
mipro.proposer.providerstringYesProvider for proposer model
mipro.proposer.temperaturefloatNoTemperature for proposer generation (default: 0.7)

Workflow

  1. Create job: Submit MIPRO job with mode: "online"
  2. Get proxy URL: Backend returns a proxy URL endpoint (via MiproOnlineSession)
  3. Run rollouts: For each rollout:
    • Call proxy URL with your task input
    • Proxy selects best prompt candidate
    • Execute LLM call with selected prompt
    • Report reward back to backend using MiproOnlineSession.update_reward()
  4. Automatic evolution: Backend generates new proposals based on rewards

Response

{
  "job_id": "pl_abc123",
  "status": "running"
}

Polling for Completion

Use GET /api/policy-optimization/online/jobs/{job_id} to check status:
{
  "job_id": "pl_abc123",
  "status": "succeeded",
  "best_score": 0.875
}

Notes

  • No tunneling required: Backend never calls your task app, so no public URL needed
  • You control rollouts: Drive the rollout loop locally in your code
  • Real-time evolution: Prompts evolve as rewards are reported
  • Proposer API key: Automatically resolved from backend environment (OPENAI_API_KEY or PROD_OPENAI_API_KEY)
  • Session management: Use MiproOnlineSession SDK class for managing online sessions and reporting rewards

See Also