Skip to main content

Eval

synth-ai eval Execute evaluation rollouts against a task app. This is the main CLI entry point for the synth-ai eval command. Execution Modes:
  • Direct Mode: If --backend is not provided, calls task app directly
  • Backend Mode: If --backend is provided, creates eval job on backend
Arguments: app_id: Task app identifier (optional, can be in config) model: Model name to override config (optional) config_path: Path to TOML config file (optional) url: Task app URL (required if not in config) backend: Backend URL for trace capture (optional) seeds: Comma-separated seed list (e.g., “0,1,2,3”) concurrency: Number of parallel rollouts (default: 1) return_trace: Whether to include traces in response traces_dir: Directory to save trace files output_txt: Path to write text report output_json: Path to write JSON report Example:
python -m synth_ai.cli eval           --config banking77_eval.toml           --url http://localhost:8103           --backend http://localhost:8000           --seeds 0,1,2,3,4           --concurrency 5           --output-json results.json           --traces-dir traces/
See Also:
  • synth_ai.cli.commands.eval.runner.run_eval(): Execution logic
  • synth_ai.cli.commands.eval.config.resolve_eval_config(): Config resolution

Execution Modes

  • Direct Mode: If --backend is not provided, calls task app directly
    • Backend Mode: If --backend is provided, creates eval job on backend

Example

        python -m synth_ai.cli eval           --config banking77_eval.toml           --url http://localhost:8103           --backend http://localhost:8000           --seeds 0,1,2,3,4           --concurrency 5           --output-json results.json           --traces-dir traces/
This documentation is auto-generated from source code docstrings.

Arguments

  • APP_ID (optional)

Options

OptionTypeDefaultDescription
--modelTEXT""-
--configTEXT""-
--trace-dbTEXT""-
--metadataTEXT--
--seedsTEXT""-
--urlTEXT""-
--backendTEXT""-
--env-fileTEXT""-
--opsTEXT""-
--return-traceflagfalse-
--concurrencyTEXT""-
--seed-setChoice(seeds, validation_seeds, test_pool)seeds-
--waitflagfalse-
--pollTEXT""-
--outputTEXT""-
--traces-dirTEXT""-
--output-txtTEXT""-
--output-jsonTEXT""-