Skip to main content

Overview

References:
  • GEPA: Agrawal et al. (2025). “GEPA: Reflective Prompt Evolution.” arXiv:2507.19457
System specifications (specs) are structured JSON documents that define task principles, rules, policies, and constraints for prompt optimization. GEPA uses specs to guide prompt generation with domain-specific knowledge.

What Are Specs?

Specs are JSON files that encode:
  • Principles: High-level guidelines for the task
  • Rules: Specific policies with priorities (0-10)
  • Constraints: Must/must-not/should directives
  • Examples: Good and bad examples for each rule
  • Glossary: Domain-specific terminology
  • Interfaces: Input/output formats and capabilities

Example Spec Structure

{
  "metadata": {
    "id": "spec.banking77_pipeline.v1",
    "title": "Banking77 Two-Stage Classification Pipeline Specification",
    "version": "1.0.0",
    "scope": "banking-intent-classification-pipeline"
  },
  "principles": [
    {
      "id": "P-clarity",
      "text": "Prioritize immediate-action intents over informational queries when multiple interpretations are possible.",
      "rationale": "Customers with urgent issues need immediate assistance."
    }
  ],
  "rules": [
    {
      "id": "R-card-disambiguation",
      "title": "Disambiguate card arrival vs. card payment issues",
      "priority": 10,
      "constraints": {
        "must": [
          "Classify as 'lost_or_stolen_card' if payment-related keywords are present",
          "Classify as 'card_arrival' if delivery-related keywords are present"
        ],
        "must_not": [
          "Assume 'card' always refers to physical delivery"
        ]
      },
      "examples": [
        {
          "kind": "good",
          "prompt": "My card was declined at the store",
          "response": "declined_card_payment",
          "description": "Payment keyword indicates payment issue"
        }
      ]
    }
  ],
  "glossary": [
    {
      "term": "disambiguate",
      "definition": "To distinguish between multiple plausible interpretations",
      "aliases": ["clarify", "distinguish"]
    }
  ]
}

How Specs Are Used

GEPA: Spec-Guided Mutations

When GEPA uses proposer_type = "spec", the spec is included in mutation prompts:
[prompt_learning.gepa]
proposer_type = "spec"  # Use spec mode
spec_path = "examples/containers/banking77_pipeline/banking77_pipeline_spec.json"
spec_max_tokens = 5000
spec_include_examples = true
spec_priority_threshold = 8  # Only include high-priority rules (8+)
How it works:
  1. Spec Loading: GEPA loads the spec JSON file at initialization
  2. Context Serialization: Spec is converted to compact markdown format (up to spec_max_tokens)
  3. Mutation Prompts: Spec context is injected into LLM-guided mutation prompts
  4. Rule Filtering: Only rules with priority >= spec_priority_threshold are included
Mutation Prompt Structure:
You are a prompt engineering expert. Improve the instruction text (DSPy-style).

Requirements:
- Preserve placeholders (e.g., {{query}}) and tool names
- Be precise, action-oriented, and unambiguous
- Keep guidance concise; avoid fluff

Current instruction:
{classifier_instruction}

Feedback (hints to address):
{feedback_text}

## System Specification
(Task principles, rules, and policies from spec document)
{spec_context}

Output: 1-3 bullet snippets (1-2 sentences each) that replace/augment the instruction.

Configuration Parameters

spec_path (Required)

Path to the spec JSON file (relative to config file or absolute).
spec_path = "examples/containers/banking77_pipeline/banking77_pipeline_spec.json"

spec_max_tokens (Default: 5000)

Maximum tokens for spec context in prompts. The serializer will:
  1. Start with high-priority rules (priority >= 7)
  2. Remove examples if still too long
  3. Remove glossary if still too long
  4. Increase priority threshold if still too long
spec_max_tokens = 5000  # Default

spec_include_examples (Default: true)

Whether to include rule examples in the spec context.
spec_include_examples = true  # Include good/bad examples

spec_priority_threshold (Optional)

Only include rules with priority >= threshold. Higher threshold = fewer but more important rules.
spec_priority_threshold = 8  # Only include priority 8+ rules
Priority Guidelines:
  • 10: Critical rules (must always be followed)
  • 9: High-priority rules (important for accuracy)
  • 8: Medium-high priority (recommended)
  • 7: Medium priority (helpful)
  • <7: Lower priority (may be filtered out)

Spec Format Details

Principles

High-level guidelines that apply across all rules:
{
  "id": "P-clarity",
  "text": "Prioritize immediate-action intents over informational queries",
  "rationale": "Customers with urgent issues need immediate assistance."
}

Rules

Specific policies with priorities and constraints:
{
  "id": "R-card-disambiguation",
  "title": "Disambiguate card arrival vs. card payment issues",
  "priority": 10,
  "rationale": "Queries mentioning 'card' can refer to physical delivery or payment problems.",
  "constraints": {
    "must": [
      "Classify as 'lost_or_stolen_card' if payment-related keywords are present"
    ],
    "must_not": [
      "Assume 'card' always refers to physical delivery"
    ],
    "should": [
      "Consider the query_analyzer's complexity assessment"
    ]
  },
  "examples": [
    {
      "kind": "good",
      "prompt": "My card was declined",
      "response": "declined_card_payment",
      "description": "Payment keyword indicates payment issue"
    },
    {
      "kind": "bad",
      "prompt": "My card isn't working",
      "response": "card_arrival",
      "description": "WRONG: Should be card_not_working"
    }
  ]
}

Constraints Types

  • must: Required behaviors (always enforced)
  • must_not: Prohibited behaviors (never allowed)
  • should: Recommended behaviors (preferred when possible)
  • should_not: Discouraged behaviors (avoid when possible)

Benefits of Using Specs

1. Domain Knowledge Injection

Specs encode expert knowledge about the task:
  • Edge cases and disambiguation rules
  • Domain-specific terminology
  • Priority-based policies

2. Constraint-Aware Optimization

GEPA respects spec constraints:
  • GEPA: Mutations follow spec rules (must/must_not)

3. Faster Convergence

Spec-guided optimization typically:
  • Converges faster (fewer generations/iterations)
  • Produces more accurate prompts
  • Better handles edge cases

4. Consistency

Specs ensure:
  • Consistent terminology across prompts
  • Alignment with domain requirements
  • Compliance with business rules

Example: Banking77 Pipeline Spec

Location: examples/containers/banking77_pipeline/banking77_pipeline_spec.json Key Rules:
  • R-card-disambiguation (Priority 10): Distinguish card delivery vs. payment issues
  • R-urgency-signals (Priority 10): Handle urgent queries (lost cards, fraud)
  • R-balance-transfer (Priority 9): Disambiguate balance update scenarios
  • R-stage-coordination (Priority 8): Coordinate between analyzer and classifier stages
Usage in Config:
[prompt_learning.gepa]
proposer_type = "spec"
spec_path = "examples/containers/banking77_pipeline/banking77_pipeline_spec.json"
spec_max_tokens = 5000
spec_include_examples = true
spec_priority_threshold = 8

When to Use Specs

Use specs when:
  • ✅ You have domain expertise to encode
  • ✅ Task has complex edge cases or disambiguation rules
  • ✅ You want faster convergence
  • ✅ Consistency with business rules is critical
  • ✅ Multi-stage pipelines need coordination rules
Skip specs when:
  • ❌ Task is simple and straightforward
  • ❌ No domain-specific rules or constraints
  • ❌ You want maximum exploration (specs may constrain search)

Creating a Spec

Step 1: Define Principles

Start with high-level guidelines:
{
  "principles": [
    {
      "id": "P-clarity",
      "text": "Prioritize immediate-action intents over informational queries",
      "rationale": "Urgent issues need immediate assistance."
    }
  ]
}

Step 2: Add Rules

Define specific policies with priorities:
{
  "rules": [
    {
      "id": "R-card-disambiguation",
      "title": "Disambiguate card arrival vs. card payment issues",
      "priority": 10,
      "constraints": {
        "must": [
          "Classify as 'lost_or_stolen_card' if payment keywords present"
        ]
      },
      "examples": [
        {
          "kind": "good",
          "prompt": "My card was declined",
          "response": "declined_card_payment"
        }
      ]
    }
  ]
}

Step 3: Add Glossary

Define domain-specific terms:
{
  "glossary": [
    {
      "term": "disambiguate",
      "definition": "To distinguish between multiple plausible interpretations",
      "aliases": ["clarify", "distinguish"]
    }
  ]
}

Step 4: Reference in Config

Point to the spec file:
[prompt_learning.gepa]
proposer_type = "spec"
spec_path = "path/to/your/spec.json"
spec_max_tokens = 5000
spec_priority_threshold = 8

Best Practices

  1. Start with High-Priority Rules: Focus on critical constraints first (priority 8+)
  2. Include Examples: Good and bad examples help the optimizer understand intent
  3. Use Clear Constraints: Be specific with must/must_not directives
  4. Test Token Limits: Ensure spec_max_tokens fits in your model’s context window
  5. Filter by Priority: Use spec_priority_threshold to focus on important rules
  6. Update Regularly: Keep specs in sync with task requirements

Comparison: DSPy vs Spec Mode

AspectDSPy ModeSpec Mode
GuidanceGeneric prompt engineering principlesDomain-specific rules and constraints
ConvergenceSlower (broader exploration)Faster (focused search)
AccuracyGood for general tasksBetter for domain-specific tasks
SetupNo additional filesRequires spec JSON file
Best ForSimple tasks, explorationComplex tasks, edge cases

Next Steps