Howie (email agent)
What it validates:- End-to-end orchestration for a repo-backed harness
- Spend metering and budget accounting
- Morning summary and proof artifact generation
- Completed run with non-zero spend and downloadable deliverables.
Crafter (GEPA vs MIPRO)
What it validates:- Research scenario orchestration
- Policy optimization flows and trial matrix behavior
- Algorithm-specific artifact generation
- Completed optimization run with experiment outputs and synthesis artifacts.
MintlifyBench (docs generation)
What it validates:- Content generation from SDK repo + company URL
- Worker compliance with strict output contracts
- Scoring pipeline for generated docs quality
- Mintlify docs output (
.mdx+mint.json) and associated scored artifacts.
Choosing a scenario
- Start with Howie for baseline platform validation.
- Use MintlifyBench for content-generation and docs-focused partners.
- Use Crafter for optimization-focused partners and research workloads.