Overview
The eval API routes LLM calls through the inference interceptor, which:- Captures traces automatically via correlation IDs
- Stores traces in trace store (Redis/Turso)
- Enables token usage extraction and cost calculation
/api/eval
Authentication: Bearer token via Authorization header
Endpoints
Create Eval Job
Create a new eval job and start execution. Endpoint:POST /api/eval/jobs
Request:
200/201— Job created successfully400— Invalid request (missing required fields)401— Authentication failed
Get Job Status
Fetch the current status of an eval job. Endpoint:GET /api/eval/jobs/{job_id}
Response:
running— Job is currently executingcompleted— Job finished successfullyfailed— Job encountered an error
200— Success401— Authentication failed404— Job not found
Get Job Results
Fetch detailed results for a completed eval job. Endpoint:GET /api/eval/jobs/{job_id}/results
Response:
200— Success401— Authentication failed404— Job not found
Download Traces
Download traces for a completed eval job as a ZIP file. Endpoint:GET /api/eval/jobs/{job_id}/traces
Response:
- Content-Type:
application/zip - Body: ZIP file containing trace JSON files
200— Success401— Authentication failed404— Job not found
Trace Capture Flow
- Job Creation: Backend generates correlation IDs for each rollout
- Rollout Execution: Backend calls task app with correlation ID in inference URL (
?cid=...) - Interceptor Capture: Task app calls LLM via interceptor, which captures trace
- Trace Storage: Interceptor stores trace in trace store (Redis/Turso)
- Trace Hydration: Backend hydrates traces from store for cost calculation
- Result Assembly: Backend combines rollout results with trace data
Implementation
Service:monorepo/backend/app/routes/eval/job_service.py
Key Methods:
EvalJobService.create_job()— Creates eval job and starts executionEvalJobService._execute_seed()— Executes single rollout with trace captureEvalJobService._calculate_metrics()— Calculates scores, tokens, costs
monorepo/backend/app/routes/eval/routes.py
CLI Integration
Thesynth-ai eval command uses this API when --backend is provided:
- Creates job via
POST /api/eval/jobs - Polls status via
GET /api/eval/jobs/{job_id}until completed - Fetches results via
GET /api/eval/jobs/{job_id}/results
Error Handling
Common Errors:400 Bad Request— Missing required fields (task_app_url,seeds,policy.model)401 Unauthorized— Invalid or missing API key404 Not Found— Job ID doesn’t exist or belongs to different org500 Internal Server Error— Backend error during execution
Rate Limiting
Rate limiting is currently deferred for eval jobs (TODO).See Also
- CLI Eval Documentation — CLI command usage
- Inference Interceptor — Trace capture mechanism
- Trace Hydration — Trace reconstruction from envelopes