synth_ai.cli.commands.eval.runner
Eval runner for executing rollouts against task apps.
This module provides two execution modes:
-
Backend Mode (Default): Routes through backend interceptor for trace/usage capture
- Creates eval job via POST /api/eval/jobs
- Polls job status until completion
- Fetches detailed results with token costs and traces
- Requires backend_url and backend_api_key (or SYNTH_BASE_URL/SYNTH_API_KEY env vars)
-
Direct Mode: Calls task apps directly (legacy, no usage tracking)
- Makes direct HTTP requests to task app /rollout endpoint
- No trace capture or usage tracking
- Simpler but limited functionality
synth_ai.cli.commands.eval.config: Configuration loadingmonorepo/backend/app/routes/eval/job_service.py: Backend eval job service
Functions
run_eval
- Backend mode: Used if
backend_urlandbackend_api_keyare provided (or SYNTH_BASE_URL/SYNTH_API_KEY env vars are set) - Direct mode: Used otherwise (calls task app directly)
config: Evaluation configuration including task app URL, seeds, policy config, etc.
- List of EvalResult objects, one per seed, sorted by seed number.
ValueError: If required configuration is missing (task_app_url, seeds, etc.)RuntimeError: If backend job creation or polling fails
run_eval_direct
/rollout endpoint.
This mode does NOT capture traces or track token usage via the backend interceptor.
Use Cases:
- Quick local testing without backend setup
- Legacy workflows that don’t need trace capture
- Simple evaluations without cost tracking
- No trace capture (traces must be returned by task app if needed)
- No token cost calculation (unless task app provides it)
- No backend interceptor for LLM call tracking
config: Evaluation configuration. Must includetask_app_urlandseeds.
- List of EvalResult objects, one per seed.
ValueError: Iftask_app_urlorseedsare missing.
run_eval_via_backend
- Routes LLM calls through the inference interceptor
- Captures traces and token usage automatically
- Calculates costs based on model pricing
- Provides detailed results with timing and metrics
- POST
/api/eval/jobs- Create eval job - Poll GET
/api/eval/jobs/{job_id}- Check job status until completed - GET
/api/eval/jobs/{job_id}/results- Fetch detailed results
- Automatic trace capture via interceptor
- Token usage tracking and cost calculation
- Centralized job management and monitoring
- Support for async job execution
config: Evaluation configuration including task app URL, seeds, policy config.backend_url: Backend API base URL (e.g., “http://localhost:8000”)api_key: Backend API key for authentication (Bearer token)
- List of EvalResult objects with detailed metrics including tokens, costs, traces.
ValueError: If required configuration is missing.RuntimeError: If job creation, polling, or result fetching fails.