Skip to main content
POST
/
api
/
eval
/
jobs
from synth_ai.sdk.eval import EvalJob

job = await EvalJob.create(
    task_app_url="http://localhost:8103",
    app_id="banking77",
    seeds=[0, 1, 2, 3, 4],
    policy={"model": "gpt-4o", "provider": "openai"}
)

# Poll until complete
result = await job.wait()
print(f"Mean reward: {result.summary.mean_reward}")
{
  "job_id": "eval-abc123",
  "status": "running"
}
Create a new evaluation job to run your policy against a dataset.
The SDK provides EvalJob.create() with automatic polling and result parsing. See the example on the right.
task_app_url
string
required
URL of your task app (LocalAPI endpoint).
task_app_api_key
string
Environment API key for task app authentication.
app_id
string
required
Application identifier for grouping jobs.
env_name
string
required
Environment name (dataset split or config).
seeds
array
required
List of seed indices to evaluate.
policy
object
required
Model configuration for the policy.
env_config
object
Additional environment configuration passed to task app.
max_concurrent
integer
default:"5"
Maximum concurrent rollouts.
timeout
number
default:"600.0"
Job timeout in seconds.
from synth_ai.sdk.eval import EvalJob

job = await EvalJob.create(
    task_app_url="http://localhost:8103",
    app_id="banking77",
    seeds=[0, 1, 2, 3, 4],
    policy={"model": "gpt-4o", "provider": "openai"}
)

# Poll until complete
result = await job.wait()
print(f"Mean reward: {result.summary.mean_reward}")
{
  "job_id": "eval-abc123",
  "status": "running"
}

Get Job Status

Fetch the current status of an eval job.

GET /api/eval/jobs/{job_id}

Returns job status, timestamps, and summary results if completed.
Response
{
  "job_id": "eval-abc123",
  "status": "completed",
  "error": null,
  "created_at": "2025-01-15T10:00:00Z",
  "started_at": "2025-01-15T10:00:01Z",
  "completed_at": "2025-01-15T10:05:30Z",
  "results": {
    "mean_reward": 0.92,
    "total_tokens": 750,
    "total_cost_usd": 0.01
  }
}
StatusDescription
runningJob is currently executing
completedJob finished successfully
failedJob encountered an error

Get Job Results

Fetch detailed results for a completed eval job.

GET /api/eval/jobs/{job_id}/results

Returns per-seed results with traces and cost breakdown.
Response
{
  "job_id": "eval-abc123",
  "status": "completed",
  "summary": {
    "mean_reward": 0.92,
    "total_tokens": 750,
    "total_cost_usd": 0.01,
    "num_seeds": 5,
    "num_successful": 5,
    "num_failed": 0
  },
  "results": [
    {
      "seed": 0,
      "reward": 0.95,
      "latency_ms": 1234.5,
      "tokens": 150,
      "cost_usd": 0.002,
      "trace_id": "trace-abc"
    }
  ]
}

Download Traces

GET /api/eval/jobs/{job_id}/traces

Download traces as a ZIP file containing JSON trace files.
Response: application/zip