Skip to main content

synth_ai.cli.commands.eval.core

Alpha Eval command CLI entry point for task app rollouts. This module provides the Click command-line interface for the synth-ai eval command. Command Overview: The eval command executes rollouts against a task app and summarizes results. It supports two execution modes:
  1. Direct Mode: Calls task app directly (no backend required)
  2. Backend Mode: Routes through backend for trace capture and cost tracking
Usage:
# Basic usage with config file
python -m synth_ai.cli eval       --config banking77_eval.toml       --url http://localhost:8103

# With backend for trace capture
python -m synth_ai.cli eval       --config banking77_eval.toml       --url http://localhost:8103       --backend http://localhost:8000

# Override seeds from command line
python -m synth_ai.cli eval       --config banking77_eval.toml       --url http://localhost:8103       --seeds 0,1,2,3,4
Configuration: Configuration can come from:
  • TOML config file (--config)
  • Command-line arguments (override config)
  • Environment variables (for API keys, etc.)
Config file format:
[eval]
app_id = "banking77"
url = "http://localhost:8103"
env_name = "banking77"
seeds = [0, 1, 2, 3, 4]

[eval.policy_config]
model = "gpt-4"
provider = "openai"
Output:
  • Prints results table to stdout
  • Optionally writes report to --output-txt
  • Optionally writes JSON to --output-json
  • Optionally saves traces to --traces-dir
See Also:
  • synth_ai.cli.commands.eval.runner: Evaluation execution logic
  • synth_ai.cli.commands.eval.config: Configuration loading
  • monorepo/docs/cli/eval.mdx: Full CLI documentation

Functions

eval_command

eval_command(app_id: str | None, model: str, config_path: str, trace_db: str, metadata: tuple[str, ...], seeds: str, url: str, backend: str, env_file: str, ops: str, return_trace: bool, concurrency: str, seed_set: str, wait: bool, poll: str, output_path: str, traces_dir: str, output_txt: str, output_json: str) -> None
Execute evaluation rollouts against a task app. This is the main CLI entry point for the synth-ai eval command. Execution Modes:
  • Direct Mode: If --backend is not provided, calls task app directly
  • Backend Mode: If --backend is provided, creates eval job on backend
Arguments: app_id: Task app identifier (optional, can be in config) model: Model name to override config (optional) config_path: Path to TOML config file (optional) url: Task app URL (required if not in config) backend: Backend URL for trace capture (optional) seeds: Comma-separated seed list (e.g., “0,1,2,3”) concurrency: Number of parallel rollouts (default: 1) return_trace: Whether to include traces in response traces_dir: Directory to save trace files output_txt: Path to write text report output_json: Path to write JSON report Example:
python -m synth_ai.cli eval           --config banking77_eval.toml           --url http://localhost:8103           --backend http://localhost:8000           --seeds 0,1,2,3,4           --concurrency 5           --output-json results.json           --traces-dir traces/
See Also:
  • synth_ai.cli.commands.eval.runner.run_eval(): Execution logic
  • synth_ai.cli.commands.eval.config.resolve_eval_config(): Config resolution