Package your coding environment as a Dockerfile, deploy it to Harbor, and run Codex or Claude Code against it — all from one SDK call or CLI command. Harbor builds a Daytona snapshot from your image so each rollout gets a fresh, isolated sandbox that provisions in seconds.
Architecture
What happens under the hood:
You upload a Dockerfile + build context to Harbor
Harbor builds a Daytona snapshot (cached after first build)
Each rollout provisions a fresh sandbox from the snapshot (~3s)
Your chosen coding agent runs inside the sandbox
Tests execute, results and LLM traces are returned
Prerequisites
Python 3.11+
uv or pip
API keys:
SYNTH_API_KEY — your Synth platform key
OPENAI_API_KEY — for Codex agent (or ANTHROPIC_API_KEY for Claude Code)
1. Install the SDK
pip install synth-ai
# or
uv add synth-ai
2. Write your Dockerfile
Create a Dockerfile for your coding task environment. This is the container image your agent will work inside.
FROM python:3.11-slim
# Install system dependencies
RUN apt-get update && apt-get install -y git curl && rm -rf /var/lib/apt/lists/*
# Copy your project code
WORKDIR /workspace
COPY . .
# Install project dependencies
RUN pip install -r requirements.txt
# Default entrypoint for Harbor rollouts
CMD [ "run_rollout" , "--input" , "/tmp/rollout.json" , "--output" , "/tmp/result.json" ]
For a Rust project:
FROM rust:1.82-slim
RUN apt-get update && apt-get install -y git curl pkg-config && rm -rf /var/lib/apt/lists/*
WORKDIR /workspace
COPY . .
RUN cargo build --release 2>/dev/null || true
CMD [ "run_rollout" , "--input" , "/tmp/rollout.json" , "--output" , "/tmp/result.json" ]
Harbor automatically injects LLM API keys via the interceptor. Do not bake OPENAI_API_KEY or ANTHROPIC_API_KEY into your Dockerfile — the SDK will reject it.
3. Upload and build the deployment
Python SDK
from synth_ai.sdk.harbor import HarborBuildSpec, upload_harbor_deployment
spec = HarborBuildSpec(
name = "my-coding-task-v1" ,
dockerfile_path = "./Dockerfile" ,
context_dir = "." ,
entrypoint = "run_rollout --input /tmp/rollout.json --output /tmp/result.json" ,
limits = {
"timeout_s" : 600 ,
"cpu_cores" : 4 ,
"memory_mb" : 8192 ,
},
env_vars = {
"RUST_BACKTRACE" : "1" ,
},
metadata = {
"agent_type" : "codex" ,
"project" : "my-project" ,
},
)
# Upload and wait for the image to build
result = upload_harbor_deployment(spec, wait_for_ready = True )
print ( f "Deployment ready: { result.deployment_id } " )
print ( f "Name: { result.name } " )
print ( f "Status: { result.status } " )
CLI
synth harbor upload \
--name my-coding-task-v1 \
--dockerfile ./Dockerfile \
--context . \
--wait
The first build takes 2-10 minutes depending on your image size. Subsequent rollouts reuse the cached Daytona snapshot and provision in ~3 seconds.
4. Run agent rollouts
CLI — Codex with GPT-4.1
# Run 10 rollouts with Codex
synth harbor run my-coding-task-v1 \
--seeds 10 \
--model gpt-4.1-mini \
--timeout 300
# Run specific seeds
synth harbor run my-coding-task-v1 \
--seed 0 --seed 5 --seed 10 \
--model gpt-4.1 \
--timeout 600
Python SDK — Codex
import httpx
import os
SYNTH_API_KEY = os.environ[ "SYNTH_API_KEY" ]
BACKEND_URL = "https://api.usesynth.ai"
DEPLOYMENT = "my-coding-task-v1"
headers = {
"Authorization" : f "Bearer { SYNTH_API_KEY } " ,
"Content-Type" : "application/json" ,
}
# Run a single rollout
response = httpx.post(
f " { BACKEND_URL } /api/harbor/deployments/ { DEPLOYMENT } /rollout" ,
json = {
"trace_correlation_id" : "my-run-001" ,
"trace_correlation_id" : "my-run-001-s0" ,
"env" : {
"seed" : 0 ,
"env_name" : "harbor" ,
"config" : {},
},
"policy" : {
"config" : {
"model" : "gpt-4.1-mini" ,
"inference_url" : "https://api.openai.com/v1" ,
"provider" : "openai" ,
},
},
},
headers = headers,
timeout = 600.0 ,
)
result = response.json()
reward = result.get( "reward_info" , {}).get( "outcome_reward" , 0.0 )
print ( f "Reward: { reward } " )
Python SDK — Claude Code
response = httpx.post(
f " { BACKEND_URL } /api/harbor/deployments/ { DEPLOYMENT } /rollout" ,
json = {
"trace_correlation_id" : "claude-run-001" ,
"trace_correlation_id" : "claude-run-001-s0" ,
"env" : {
"seed" : 0 ,
"env_name" : "harbor" ,
"config" : {},
},
"policy" : {
"config" : {
"model" : "claude-sonnet-4-20250514" ,
"provider" : "anthropic" ,
},
},
},
headers = headers,
timeout = 600.0 ,
)
5. Run multiple rollouts in batch
import concurrent.futures
seeds = list ( range ( 10 ))
def run_seed ( seed : int ) -> dict :
resp = httpx.post(
f " { BACKEND_URL } /api/harbor/deployments/ { DEPLOYMENT } /rollout" ,
json = {
"trace_correlation_id" : f "batch- { DEPLOYMENT [: 8 ] } " ,
"trace_correlation_id" : f "batch- { DEPLOYMENT [: 8 ] } -s { seed } " ,
"env" : { "seed" : seed, "env_name" : "harbor" , "config" : {}},
"policy" : {
"config" : {
"model" : "gpt-4.1-mini" ,
"provider" : "openai" ,
},
},
},
headers = headers,
timeout = 600.0 ,
)
data = resp.json()
reward = data.get( "reward_info" , {}).get( "outcome_reward" , 0.0 )
return { "seed" : seed, "reward" : reward}
with concurrent.futures.ThreadPoolExecutor( max_workers = 6 ) as pool:
futures = [pool.submit(run_seed, s) for s in seeds]
results = [f.result() for f in concurrent.futures.as_completed(futures)]
rewards = [r[ "reward" ] for r in results]
print ( f "Mean reward: { sum (rewards) / len (rewards) :.3f} " )
print ( f "Results: { sorted (results, key = lambda r : r[ 'seed' ]) } " )
6. Check deployment status
CLI
synth harbor list
synth harbor status my-coding-task-v1
Python SDK
from synth_ai.sdk.harbor import HarborDeploymentUploader
uploader = HarborDeploymentUploader()
status = uploader.get_deployment_status( "my-coding-task-v1" )
print ( f "Status: { status[ 'status' ] } " )
print ( f "Snapshot: { status.get( 'snapshot_id' ) } " )
Agent Comparison
Agent Model Best For CLI Example Codex gpt-4.1-mini, gpt-4.1Fast iteration, OpenAI models synth harbor run DEPLOY --model gpt-4.1-miniClaude Code claude-sonnet-4-*Complex reasoning, Anthropic models synth harbor run DEPLOY --model claude-sonnet-4-20250514OpenCode gpt-4.1-mini, claude-sonnet-4-*Multi-provider flexibility synth harbor run DEPLOY --model gpt-4.1-mini
Deployment Lifecycle
Status Meaning pendingDeployment created, build not started buildingDaytona snapshot is being built from your Dockerfile readySnapshot cached, rollouts can run (~3s provisioning) failedBuild failed — check logs, fix Dockerfile, re-trigger
Troubleshooting
Build fails — Check your Dockerfile builds locally first: docker build -t test .
“LLM API key in env_vars” — Remove OPENAI_API_KEY / ANTHROPIC_API_KEY from your Dockerfile and env_vars. Harbor injects these automatically via the interceptor.
Rollout timeouts — Increase timeout_s in limits and the --timeout CLI flag. Complex coding tasks may need 600s+.
Slow first rollout — The first rollout after a build provisions the Daytona snapshot. Subsequent rollouts reuse the cached snapshot (~3s).
Agent can’t find files — Make sure your WORKDIR in the Dockerfile matches where the agent expects to find code.
Next Steps
Optimize instructions with GEPA : Use Harbor deployments as the execution backend for coding agent optimization to evolve AGENTS.md and skills files.
Add custom evaluation : Write test suites that return pass/fail for automated reward scoring.
Use Container Pools : For pre-provisioned pools of containers, use SynthClient(...).pools and client.pools.{harbor,openenv,horizons,arbitrary}.
Ready to get started?
Get Started Sign up and start running coding agents today.
Schedule Demo See Synth in action with a personalized walkthrough.