Goal
Give the worker the benchmark command, budget, target metric, and review criteria. Require evidence for every claimed improvement.Launch
Prompt details to include
- benchmark command
- baseline metric and target metric
- allowed files and forbidden files
- maximum runtime or spend
- required report format
Expected evidence
- baseline and candidate metrics
- changed files
- command output
- artifact manifest
- final report with reproducibility notes
Failure notes
Userunbook="heavy" when the benchmark requires longer multi-actor work. Use directed_effort when the target metric and command are known.