Create Eval Job

from synth_ai.sdk.eval import EvalJob

job = await EvalJob.create(
    task_app_url="http://localhost:8103",
    app_id="banking77",
    seeds=[0, 1, 2, 3, 4],
    policy={"model": "gpt-4o", "provider": "openai"}
)

# Poll until complete
result = await job.wait()
print(f"Mean reward: {result.summary.mean_reward}")

{
  "job_id": "eval-abc123",
  "status": "running"
}

POST

api

eval

jobs

from synth_ai.sdk.eval import EvalJob

job = await EvalJob.create(
    task_app_url="http://localhost:8103",
    app_id="banking77",
    seeds=[0, 1, 2, 3, 4],
    policy={"model": "gpt-4o", "provider": "openai"}
)

# Poll until complete
result = await job.wait()
print(f"Mean reward: {result.summary.mean_reward}")

{
  "job_id": "eval-abc123",
  "status": "running"
}

Create a new evaluation job to run your policy against a dataset.

The SDK provides EvalJob.create() with automatic polling and result parsing. See the example on the right.

task_app_url

string

required

URL of your task app (LocalAPI endpoint).

task_app_api_key

string

Environment API key for task app authentication.

app_id

string

required

Application identifier for grouping jobs.

env_name

string

required

Environment name (dataset split or config).

seeds

array

required

List of seed indices to evaluate.

policy

object

required

Model configuration for the policy.

Show policy properties

policy.model

string

required

Model name (e.g., gpt-4o, claude-sonnet-4-20250514).

policy.provider

string

required

Provider name (openai, anthropic, google).

env_config

object

Additional environment configuration passed to task app.

max_concurrent

integer

default:"5"

Maximum concurrent rollouts.

timeout

number

default:"600.0"

Job timeout in seconds.

from synth_ai.sdk.eval import EvalJob

job = await EvalJob.create(
    task_app_url="http://localhost:8103",
    app_id="banking77",
    seeds=[0, 1, 2, 3, 4],
    policy={"model": "gpt-4o", "provider": "openai"}
)

# Poll until complete
result = await job.wait()
print(f"Mean reward: {result.summary.mean_reward}")

{
  "job_id": "eval-abc123",
  "status": "running"
}

Get Job Status

Fetch the current status of an eval job.

GET /api/eval/jobs/{job_id}

Returns job status, timestamps, and summary results if completed.

Response

{
  "job_id": "eval-abc123",
  "status": "completed",
  "error": null,
  "created_at": "2025-01-15T10:00:00Z",
  "started_at": "2025-01-15T10:00:01Z",
  "completed_at": "2025-01-15T10:05:30Z",
  "results": {
    "mean_reward": 0.92,
    "total_tokens": 750,
    "total_cost_usd": 0.01
  }
}

Status	Description
`running`	Job is currently executing
`completed`	Job finished successfully
`failed`	Job encountered an error

Get Job Results

Fetch detailed results for a completed eval job.

GET /api/eval/jobs/{job_id}/results

Returns per-seed results with traces and cost breakdown.

Response

{
  "job_id": "eval-abc123",
  "status": "completed",
  "summary": {
    "mean_reward": 0.92,
    "total_tokens": 750,
    "total_cost_usd": 0.01,
    "num_seeds": 5,
    "num_successful": 5,
    "num_failed": 0
  },
  "results": [
    {
      "seed": 0,
      "reward": 0.95,
      "latency_ms": 1234.5,
      "tokens": 150,
      "cost_usd": 0.002,
      "trace_id": "trace-abc"
    }
  ]
}

Download Traces

GET /api/eval/jobs/{job_id}/traces

Download traces as a ZIP file containing JSON trace files.

Response: application/zip

MIPRO Offline Get Job

⌘I

Policy Optimization

Eval

Deployments

Graph Completions

Graph Optimization

SDK Reference

Get Job Status

GET /api/eval/jobs/{job_id}

Get Job Results

GET /api/eval/jobs/{job_id}/results

Download Traces

GET /api/eval/jobs/{job_id}/traces

Policy Optimization

Eval

Deployments

Graph Completions

Graph Optimization

SDK Reference

​Get Job Status

GET /api/eval/jobs/{job_id}

​Get Job Results

GET /api/eval/jobs/{job_id}/results

​Download Traces

GET /api/eval/jobs/{job_id}/traces

Get Job Status

Get Job Results

Download Traces