Eval

synth-ai eval Execute evaluation rollouts against a task app. This is the main CLI entry point for the synth-ai eval command. Execution Modes:

Direct Mode: If --backend is not provided, calls task app directly
Backend Mode: If --backend is provided, creates eval job on backend

Arguments: app_id: Task app identifier (optional, can be in config) model: Model name to override config (optional) config_path: Path to TOML config file (optional) url: Task app URL (required if not in config) backend: Backend URL for trace capture (optional) seeds: Comma-separated seed list (e.g., “0,1,2,3”) concurrency: Number of parallel rollouts (default: 1) return_trace: Whether to include traces in response traces_dir: Directory to save trace files output_txt: Path to write text report output_json: Path to write JSON report Example:

python -m synth_ai.cli eval           --config banking77_eval.toml           --url http://localhost:8103           --backend http://localhost:8000           --seeds 0,1,2,3,4           --concurrency 5           --output-json results.json           --traces-dir traces/

See Also:

synth_ai.cli.commands.eval.runner.run_eval(): Execution logic
synth_ai.cli.commands.eval.config.resolve_eval_config(): Config resolution

Execution Modes

Direct Mode: If --backend is not provided, calls task app directly
- Backend Mode: If --backend is provided, creates eval job on backend

Example

        python -m synth_ai.cli eval           --config banking77_eval.toml           --url http://localhost:8103           --backend http://localhost:8000           --seeds 0,1,2,3,4           --concurrency 5           --output-json results.json           --traces-dir traces/

This documentation is auto-generated from source code docstrings.

Arguments

APP_ID (optional)

Options

Option	Type	Default	Description
`--model`	TEXT	""	-
`--config`	TEXT	""	-
`--trace-db`	TEXT	""	-
`--metadata`	TEXT	-	-
`--seeds`	TEXT	""	-
`--url`	TEXT	""	-
`--backend`	TEXT	""	-
`--env-file`	TEXT	""	-
`--ops`	TEXT	""	-
`--return-trace`	flag	false	-
`--concurrency`	TEXT	""	-
`--seed-set`	Choice(seeds, validation_seeds, test_pool)	seeds	-
`--wait`	flag	false	-
`--poll`	TEXT	""	-
`--output`	TEXT	""	-
`--traces-dir`	TEXT	""	-
`--output-txt`	TEXT	""	-
`--output-json`	TEXT	""	-

CLI Reference

Python SDK Reference

Eval

Eval

Execution Modes

Example

Arguments

Options

CLI Reference

Python SDK Reference

​Eval

​Execution Modes

​Example

​Arguments

​Options

Eval

Execution Modes

Example

Arguments

Options