`synth_ai.cli.commands.eval.core`

Alpha Eval command CLI entry point for task app rollouts. This module provides the Click command-line interface for the synth-ai eval command. Command Overview: The eval command executes rollouts against a task app and summarizes results. It supports two execution modes:

Direct Mode: Calls task app directly (no backend required)
Backend Mode: Routes through backend for trace capture and cost tracking

Usage:

# Basic usage with config file
python -m synth_ai.cli eval       --config banking77_eval.toml       --url http://localhost:8103

# With backend for trace capture
python -m synth_ai.cli eval       --config banking77_eval.toml       --url http://localhost:8103       --backend http://localhost:8000

# Override seeds from command line
python -m synth_ai.cli eval       --config banking77_eval.toml       --url http://localhost:8103       --seeds 0,1,2,3,4

Configuration: Configuration can come from:

TOML config file (--config)
Command-line arguments (override config)
Environment variables (for API keys, etc.)

Config file format:

[eval]
app_id = "banking77"
url = "http://localhost:8103"
env_name = "banking77"
seeds = [0, 1, 2, 3, 4]

[eval.policy_config]
model = "gpt-4"
provider = "openai"

Output:

Prints results table to stdout
Optionally writes report to --output-txt
Optionally writes JSON to --output-json
Optionally saves traces to --traces-dir

See Also:

synth_ai.cli.commands.eval.runner: Evaluation execution logic
synth_ai.cli.commands.eval.config: Configuration loading
monorepo/docs/cli/eval.mdx: Full CLI documentation

Functions

`eval_command`

eval_command(app_id: str | None, model: str, config_path: str, trace_db: str, metadata: tuple[str, ...], seeds: str, url: str, backend: str, env_file: str, ops: str, return_trace: bool, concurrency: str, seed_set: str, wait: bool, poll: str, output_path: str, traces_dir: str, output_txt: str, output_json: str) -> None

Execute evaluation rollouts against a task app. This is the main CLI entry point for the synth-ai eval command. Execution Modes:

Direct Mode: If --backend is not provided, calls task app directly
Backend Mode: If --backend is provided, creates eval job on backend

Arguments: app_id: Task app identifier (optional, can be in config) model: Model name to override config (optional) config_path: Path to TOML config file (optional) url: Task app URL (required if not in config) backend: Backend URL for trace capture (optional) seeds: Comma-separated seed list (e.g., “0,1,2,3”) concurrency: Number of parallel rollouts (default: 1) return_trace: Whether to include traces in response traces_dir: Directory to save trace files output_txt: Path to write text report output_json: Path to write JSON report Example:

python -m synth_ai.cli eval           --config banking77_eval.toml           --url http://localhost:8103           --backend http://localhost:8000           --seeds 0,1,2,3,4           --concurrency 5           --output-json results.json           --traces-dir traces/

See Also:

synth_ai.cli.commands.eval.runner.run_eval(): Execution logic
synth_ai.cli.commands.eval.config.resolve_eval_config(): Config resolution

CLI Reference

Python SDK Reference

Eval Core

`synth_ai.cli.commands.eval.core`

Functions

`eval_command`

CLI Reference

Python SDK Reference

​synth_ai.cli.commands.eval.core

​Functions

​eval_command

`synth_ai.cli.commands.eval.core`

Functions

`eval_command`