Skip to main content
Managed Research runs Synth’s optimization and evaluation APIs — GEPA, MIPRO, dataset assembly, harness builds — as an overnight service against your actual codebase. You connect a repo, point it at your traces or dataset, and wake up to experiment results, an optimized prompt or policy, and a proof bundle. It’s the same Synth AI SDK you’d run yourself, orchestrated by agents that know how to build harnesses, run trial matrices, score outputs, and ship artifacts without you babysitting it.

What it runs

  • Prompt and policy optimization — baseline → GEPA/MIPRO → holdout against your labeled dataset, with before/after scores and the winning candidate as a PR
  • Evaluation loops — nightly runs against a versioned harness and dataset; structured scoring on every run
  • Dataset and eval assembly — agents build dataset splits and verifiers from your traces and repo

How it works

You trigger a run. A Codex orchestrator agent claims it, provisions your repo into a workspace, and uses dispatch_worker to spin up Codex worker agents in isolated Daytona sandboxes. Workers read their task instructions and the project config, run the optimization (GEPA or MIPRO via the pre-deployed run_gepa.py), and emit artifacts as they go. When it’s done you get a morning_summary, experiment_results, and a proof_bundle_manifest linking everything.
trigger run


orchestrator (Codex agent)
─ claims run, provisions repo into workspace
─ dispatch_worker → worker agent (Codex in Daytona sandbox)


worker reads task instructions + synth_ai.policy_optimization config
─ runs GEPA or MIPRO via run_gepa.py
─ emits artifacts (gepa_results, optimized_prompt, experiment_results, ...)


deliverables in artifact store
─ morning_summary  (what ran, scores, spend)
─ experiment_results  (trial matrix outputs)
─ proof_bundle_manifest  (config + artifacts + traces)