Package your coding environment as a Dockerfile, deploy it to Harbor, and run Codex or Claude Code against it — all from one SDK call or CLI command. Harbor builds a Daytona snapshot from your image so each rollout gets a fresh, isolated sandbox that provisions in seconds.
Architecture
What happens under the hood:
- You upload a Dockerfile + build context to Harbor
- Harbor builds a Daytona snapshot (cached after first build)
- Each rollout provisions a fresh sandbox from the snapshot (~3s)
- Your chosen coding agent runs inside the sandbox
- Tests execute, results and LLM traces are returned
Prerequisites
- Python 3.11+
uv or pip
- API keys:
SYNTH_API_KEY — your Synth platform key
OPENAI_API_KEY — for Codex agent (or ANTHROPIC_API_KEY for Claude Code)
1. Install the SDK
pip install synth-ai
# or
uv add synth-ai
2. Write your Dockerfile
Create a Dockerfile for your coding task environment. This is the container image your agent will work inside.
FROM python:3.11-slim
# Install system dependencies
RUN apt-get update && apt-get install -y git curl && rm -rf /var/lib/apt/lists/*
# Copy your project code
WORKDIR /workspace
COPY . .
# Install project dependencies
RUN pip install -r requirements.txt
# Default entrypoint for Harbor rollouts
CMD ["run_rollout", "--input", "/tmp/rollout.json", "--output", "/tmp/result.json"]
For a Rust project:
FROM rust:1.82-slim
RUN apt-get update && apt-get install -y git curl pkg-config && rm -rf /var/lib/apt/lists/*
WORKDIR /workspace
COPY . .
RUN cargo build --release 2>/dev/null || true
CMD ["run_rollout", "--input", "/tmp/rollout.json", "--output", "/tmp/result.json"]
Harbor automatically injects LLM API keys via the interceptor. Do not bake OPENAI_API_KEY or ANTHROPIC_API_KEY into your Dockerfile — the SDK will reject it.
3. Upload and build the deployment
Python SDK
from synth_ai.sdk.harbor import HarborBuildSpec, upload_harbor_deployment
spec = HarborBuildSpec(
name="my-coding-task-v1",
dockerfile_path="./Dockerfile",
context_dir=".",
entrypoint="run_rollout --input /tmp/rollout.json --output /tmp/result.json",
limits={
"timeout_s": 600,
"cpu_cores": 4,
"memory_mb": 8192,
},
env_vars={
"RUST_BACKTRACE": "1",
},
metadata={
"agent_type": "codex",
"project": "my-project",
},
)
# Upload and wait for the image to build
result = upload_harbor_deployment(spec, wait_for_ready=True)
print(f"Deployment ready: {result.deployment_id}")
print(f"Name: {result.name}")
print(f"Status: {result.status}")
CLI
synth harbor upload \
--name my-coding-task-v1 \
--dockerfile ./Dockerfile \
--context . \
--wait
The first build takes 2-10 minutes depending on your image size. Subsequent rollouts reuse the cached Daytona snapshot and provision in ~3 seconds.
4. Run agent rollouts
CLI — Codex with GPT-4.1
# Run 10 rollouts with Codex
synth harbor run my-coding-task-v1 \
--seeds 10 \
--model gpt-4.1-mini \
--timeout 300
# Run specific seeds
synth harbor run my-coding-task-v1 \
--seed 0 --seed 5 --seed 10 \
--model gpt-4.1 \
--timeout 600
Python SDK — Codex
import httpx
import os
SYNTH_API_KEY = os.environ["SYNTH_API_KEY"]
BACKEND_URL = "https://api.usesynth.ai"
DEPLOYMENT = "my-coding-task-v1"
headers = {
"Authorization": f"Bearer {SYNTH_API_KEY}",
"Content-Type": "application/json",
}
# Run a single rollout
response = httpx.post(
f"{BACKEND_URL}/api/harbor/deployments/{DEPLOYMENT}/rollout",
json={
"run_id": "my-run-001",
"trace_correlation_id": "my-run-001-s0",
"env": {
"seed": 0,
"env_name": "harbor",
"config": {},
},
"policy": {
"config": {
"model": "gpt-4.1-mini",
"inference_url": "https://api.openai.com/v1",
"provider": "openai",
},
},
},
headers=headers,
timeout=600.0,
)
result = response.json()
reward = result.get("reward_info", {}).get("outcome_reward", 0.0)
print(f"Reward: {reward}")
Python SDK — Claude Code
response = httpx.post(
f"{BACKEND_URL}/api/harbor/deployments/{DEPLOYMENT}/rollout",
json={
"run_id": "claude-run-001",
"trace_correlation_id": "claude-run-001-s0",
"env": {
"seed": 0,
"env_name": "harbor",
"config": {},
},
"policy": {
"config": {
"model": "claude-sonnet-4-20250514",
"provider": "anthropic",
},
},
},
headers=headers,
timeout=600.0,
)
5. Run multiple rollouts in batch
import concurrent.futures
seeds = list(range(10))
def run_seed(seed: int) -> dict:
resp = httpx.post(
f"{BACKEND_URL}/api/harbor/deployments/{DEPLOYMENT}/rollout",
json={
"run_id": f"batch-{DEPLOYMENT[:8]}",
"trace_correlation_id": f"batch-{DEPLOYMENT[:8]}-s{seed}",
"env": {"seed": seed, "env_name": "harbor", "config": {}},
"policy": {
"config": {
"model": "gpt-4.1-mini",
"provider": "openai",
},
},
},
headers=headers,
timeout=600.0,
)
data = resp.json()
reward = data.get("reward_info", {}).get("outcome_reward", 0.0)
return {"seed": seed, "reward": reward}
with concurrent.futures.ThreadPoolExecutor(max_workers=6) as pool:
futures = [pool.submit(run_seed, s) for s in seeds]
results = [f.result() for f in concurrent.futures.as_completed(futures)]
rewards = [r["reward"] for r in results]
print(f"Mean reward: {sum(rewards) / len(rewards):.3f}")
print(f"Results: {sorted(results, key=lambda r: r['seed'])}")
6. Check deployment status
CLI
synth harbor list
synth harbor status my-coding-task-v1
Python SDK
from synth_ai.sdk.harbor import HarborDeploymentUploader
uploader = HarborDeploymentUploader()
status = uploader.get_deployment_status("my-coding-task-v1")
print(f"Status: {status['status']}")
print(f"Snapshot: {status.get('snapshot_id')}")
Agent Comparison
| Agent | Model | Best For | CLI Example |
|---|
| Codex | gpt-4.1-mini, gpt-4.1 | Fast iteration, OpenAI models | synth harbor run DEPLOY --model gpt-4.1-mini |
| Claude Code | claude-sonnet-4-* | Complex reasoning, Anthropic models | synth harbor run DEPLOY --model claude-sonnet-4-20250514 |
| OpenCode | gpt-4.1-mini, claude-sonnet-4-* | Multi-provider flexibility | synth harbor run DEPLOY --model gpt-4.1-mini |
Deployment Lifecycle
| Status | Meaning |
|---|
pending | Deployment created, build not started |
building | Daytona snapshot is being built from your Dockerfile |
ready | Snapshot cached, rollouts can run (~3s provisioning) |
failed | Build failed — check logs, fix Dockerfile, re-trigger |
Troubleshooting
- Build fails — Check your Dockerfile builds locally first:
docker build -t test .
- “LLM API key in env_vars” — Remove
OPENAI_API_KEY / ANTHROPIC_API_KEY from your Dockerfile and env_vars. Harbor injects these automatically via the interceptor.
- Rollout timeouts — Increase
timeout_s in limits and the --timeout CLI flag. Complex coding tasks may need 600s+.
- Slow first rollout — The first rollout after a build provisions the Daytona snapshot. Subsequent rollouts reuse the cached snapshot (~3s).
- Agent can’t find files — Make sure your
WORKDIR in the Dockerfile matches where the agent expects to find code.
Next Steps
- Optimize instructions with GEPA: Use Harbor deployments as the execution backend for coding agent optimization to evolve AGENTS.md and skills files.
- Add custom evaluation: Write test suites that return pass/fail for automated reward scoring.
- Use Environment Pools: For pre-provisioned pools of containers, see the Environment Pools SDK reference.
Ready to get started?