The Task
Implement Pokemon TCG card effects in Rust by replacingtodo!() stubs with working code. The agent is evaluated on test pass rate.
Architecture
1. Install and Setup
2. Configure GEPA
enginebench_gepa.toml
3. Enable Unified Optimization (Optional)
Optimize AGENTS.md and skills files together:enginebench_gepa_unified.toml
4. Run Optimization
Results
| Metric | Baseline | Optimized |
|---|---|---|
| Test pass rate | 45% | 78% |
| Compilation success | 72% | 94% |
What Gets Optimized
GEPA evolves instruction content while preserving structure:| Component | What Gets Optimized |
|---|---|
| System Prompt | Core instructions passed to the LLM |
| AGENTS.md | Startup instructions for Codex/OpenCode |
| Skills Files | .codex/skills.yaml, .opencode/skills.yaml |
Supported Agents
| Agent | CLI | Instructions Source |
|---|---|---|
| Codex | codex exec | AGENTS.md, .codex/skills.yaml |
| OpenCode | opencode | AGENTS.md, .opencode/skills.yaml |
| Claude Code | claude | AGENTS.md |
Next Steps
- Coding Agent Cookbook - Full walkthrough with example outputs
- EngineBench Demo - Source code
- GEPA Reference - Complete API documentation