Skip to main content
Package your coding environment as a Dockerfile, deploy it to Harbor, and run Codex or Claude Code against it — all from one SDK call or CLI command. Harbor builds a Daytona snapshot from your image so each rollout gets a fresh, isolated sandbox that provisions in seconds.

Architecture

What happens under the hood:
  1. You upload a Dockerfile + build context to Harbor
  2. Harbor builds a Daytona snapshot (cached after first build)
  3. Each rollout provisions a fresh sandbox from the snapshot (~3s)
  4. Your chosen coding agent runs inside the sandbox
  5. Tests execute, results and LLM traces are returned

Prerequisites

  • Python 3.11+
  • uv or pip
  • API keys:
    • SYNTH_API_KEY — your Synth platform key
    • OPENAI_API_KEY — for Codex agent (or ANTHROPIC_API_KEY for Claude Code)

1. Install the SDK

pip install synth-ai
# or
uv add synth-ai

2. Write your Dockerfile

Create a Dockerfile for your coding task environment. This is the container image your agent will work inside.
Dockerfile
FROM python:3.11-slim

# Install system dependencies
RUN apt-get update && apt-get install -y git curl && rm -rf /var/lib/apt/lists/*

# Copy your project code
WORKDIR /workspace
COPY . .

# Install project dependencies
RUN pip install -r requirements.txt

# Default entrypoint for Harbor rollouts
CMD ["run_rollout", "--input", "/tmp/rollout.json", "--output", "/tmp/result.json"]
For a Rust project:
Dockerfile
FROM rust:1.82-slim

RUN apt-get update && apt-get install -y git curl pkg-config && rm -rf /var/lib/apt/lists/*

WORKDIR /workspace
COPY . .

RUN cargo build --release 2>/dev/null || true

CMD ["run_rollout", "--input", "/tmp/rollout.json", "--output", "/tmp/result.json"]
Harbor automatically injects LLM API keys via the interceptor. Do not bake OPENAI_API_KEY or ANTHROPIC_API_KEY into your Dockerfile — the SDK will reject it.

3. Upload and build the deployment

Python SDK

from synth_ai.sdk.harbor import HarborBuildSpec, upload_harbor_deployment

spec = HarborBuildSpec(
    name="my-coding-task-v1",
    dockerfile_path="./Dockerfile",
    context_dir=".",
    entrypoint="run_rollout --input /tmp/rollout.json --output /tmp/result.json",
    limits={
        "timeout_s": 600,
        "cpu_cores": 4,
        "memory_mb": 8192,
    },
    env_vars={
        "RUST_BACKTRACE": "1",
    },
    metadata={
        "agent_type": "codex",
        "project": "my-project",
    },
)

# Upload and wait for the image to build
result = upload_harbor_deployment(spec, wait_for_ready=True)
print(f"Deployment ready: {result.deployment_id}")
print(f"Name: {result.name}")
print(f"Status: {result.status}")

CLI

synth harbor upload \
  --name my-coding-task-v1 \
  --dockerfile ./Dockerfile \
  --context . \
  --wait
The first build takes 2-10 minutes depending on your image size. Subsequent rollouts reuse the cached Daytona snapshot and provision in ~3 seconds.

4. Run agent rollouts

CLI — Codex with GPT-4.1

# Run 10 rollouts with Codex
synth harbor run my-coding-task-v1 \
  --seeds 10 \
  --model gpt-4.1-mini \
  --timeout 300
# Run specific seeds
synth harbor run my-coding-task-v1 \
  --seed 0 --seed 5 --seed 10 \
  --model gpt-4.1 \
  --timeout 600

Python SDK — Codex

import httpx
import os

SYNTH_API_KEY = os.environ["SYNTH_API_KEY"]
BACKEND_URL = "https://api.usesynth.ai"
DEPLOYMENT = "my-coding-task-v1"

headers = {
    "Authorization": f"Bearer {SYNTH_API_KEY}",
    "Content-Type": "application/json",
}

# Run a single rollout
response = httpx.post(
    f"{BACKEND_URL}/api/harbor/deployments/{DEPLOYMENT}/rollout",
    json={
        "run_id": "my-run-001",
        "trace_correlation_id": "my-run-001-s0",
        "env": {
            "seed": 0,
            "env_name": "harbor",
            "config": {},
        },
        "policy": {
            "config": {
                "model": "gpt-4.1-mini",
                "inference_url": "https://api.openai.com/v1",
                "provider": "openai",
            },
        },
    },
    headers=headers,
    timeout=600.0,
)

result = response.json()
reward = result.get("reward_info", {}).get("outcome_reward", 0.0)
print(f"Reward: {reward}")

Python SDK — Claude Code

response = httpx.post(
    f"{BACKEND_URL}/api/harbor/deployments/{DEPLOYMENT}/rollout",
    json={
        "run_id": "claude-run-001",
        "trace_correlation_id": "claude-run-001-s0",
        "env": {
            "seed": 0,
            "env_name": "harbor",
            "config": {},
        },
        "policy": {
            "config": {
                "model": "claude-sonnet-4-20250514",
                "provider": "anthropic",
            },
        },
    },
    headers=headers,
    timeout=600.0,
)

5. Run multiple rollouts in batch

import concurrent.futures

seeds = list(range(10))

def run_seed(seed: int) -> dict:
    resp = httpx.post(
        f"{BACKEND_URL}/api/harbor/deployments/{DEPLOYMENT}/rollout",
        json={
            "run_id": f"batch-{DEPLOYMENT[:8]}",
            "trace_correlation_id": f"batch-{DEPLOYMENT[:8]}-s{seed}",
            "env": {"seed": seed, "env_name": "harbor", "config": {}},
            "policy": {
                "config": {
                    "model": "gpt-4.1-mini",
                    "provider": "openai",
                },
            },
        },
        headers=headers,
        timeout=600.0,
    )
    data = resp.json()
    reward = data.get("reward_info", {}).get("outcome_reward", 0.0)
    return {"seed": seed, "reward": reward}

with concurrent.futures.ThreadPoolExecutor(max_workers=6) as pool:
    futures = [pool.submit(run_seed, s) for s in seeds]
    results = [f.result() for f in concurrent.futures.as_completed(futures)]

rewards = [r["reward"] for r in results]
print(f"Mean reward: {sum(rewards) / len(rewards):.3f}")
print(f"Results: {sorted(results, key=lambda r: r['seed'])}")

6. Check deployment status

CLI

synth harbor list
synth harbor status my-coding-task-v1

Python SDK

from synth_ai.sdk.harbor import HarborDeploymentUploader

uploader = HarborDeploymentUploader()
status = uploader.get_deployment_status("my-coding-task-v1")
print(f"Status: {status['status']}")
print(f"Snapshot: {status.get('snapshot_id')}")

Agent Comparison

AgentModelBest ForCLI Example
Codexgpt-4.1-mini, gpt-4.1Fast iteration, OpenAI modelssynth harbor run DEPLOY --model gpt-4.1-mini
Claude Codeclaude-sonnet-4-*Complex reasoning, Anthropic modelssynth harbor run DEPLOY --model claude-sonnet-4-20250514
OpenCodegpt-4.1-mini, claude-sonnet-4-*Multi-provider flexibilitysynth harbor run DEPLOY --model gpt-4.1-mini

Deployment Lifecycle

StatusMeaning
pendingDeployment created, build not started
buildingDaytona snapshot is being built from your Dockerfile
readySnapshot cached, rollouts can run (~3s provisioning)
failedBuild failed — check logs, fix Dockerfile, re-trigger

Troubleshooting

  • Build fails — Check your Dockerfile builds locally first: docker build -t test .
  • “LLM API key in env_vars” — Remove OPENAI_API_KEY / ANTHROPIC_API_KEY from your Dockerfile and env_vars. Harbor injects these automatically via the interceptor.
  • Rollout timeouts — Increase timeout_s in limits and the --timeout CLI flag. Complex coding tasks may need 600s+.
  • Slow first rollout — The first rollout after a build provisions the Daytona snapshot. Subsequent rollouts reuse the cached snapshot (~3s).
  • Agent can’t find files — Make sure your WORKDIR in the Dockerfile matches where the agent expects to find code.

Next Steps

  • Optimize instructions with GEPA: Use Harbor deployments as the execution backend for coding agent optimization to evolve AGENTS.md and skills files.
  • Add custom evaluation: Write test suites that return pass/fail for automated reward scoring.
  • Use Environment Pools: For pre-provisioned pools of containers, see the Environment Pools SDK reference.

Ready to get started?