Skip to main content
Complete guide to optimizing prompts for Banking77 intent classification using GEPA.

Overview

Banking77 is an intent classification task with 77 banking-related intents. GEPA typically improves accuracy from 60-75% (baseline) to 85-90%+ over 15 generations.

Prerequisites

# Install dependencies
uv pip install -e .

# Set API keys
export SYNTH_API_KEY="your-backend-api-key"
export GROQ_API_KEY="gsk_your_groq_key"
export ENVIRONMENT_API_KEY="$(python -c 'import secrets; print(secrets.token_urlsafe(32))')"
Where to get API keys:
  • GROQ_API_KEY: Get from https://console.groq.com/keys
  • SYNTH_API_KEY: Get from your backend admin or .env.dev file
  • ENVIRONMENT_API_KEY: Generate a random secure token (command above)

Step 1: Deploy Task App

Option A: Using helper script (recommended)
# Terminal 1
./examples/blog_posts/gepa/deploy_banking77_task_app.sh
Option B: Using CLI
uvx synth-ai deploy banking77 --runtime uvicorn --port 8102
Option C: Deploy to Modal
uvx synth-ai deploy banking77 --runtime modal --name banking77-gepa --env-file .env
Verify the task app is running:
curl -H "X-API-Key: $ENVIRONMENT_API_KEY" http://127.0.0.1:8102/health

Step 2: Create Config

Create banking77_gepa.toml:
[prompt_learning]
algorithm = "gepa"
task_app_url = "http://127.0.0.1:8102"
task_app_id = "banking77"

# Training seeds (30 seeds from train pool)
evaluation_seeds = [50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79]

# Validation seeds (50 seeds from validation pool - not in training)
validation_seeds = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]

[prompt_learning.initial_prompt]
messages = [
  { role = "system", content = "You are a banking intent classification assistant." },
  { role = "user", pattern = "Customer Query: {query}\n\nClassify this query into one of 77 banking intents." }
]

[prompt_learning.gepa]
initial_population_size = 20    # Starting population of prompts
num_generations = 15            # Number of evolutionary cycles
mutation_rate = 0.3             # Probability of mutation
crossover_rate = 0.5            # Probability of crossover
rollout_budget = 1000           # Total rollouts across all generations
max_concurrent_rollouts = 20    # Parallel rollout limit
pareto_set_size = 20           # Size of Pareto front

Step 3: Run Optimization

Option A: Using helper script (recommended)
# Terminal 2
./examples/blog_posts/gepa/run_gepa_banking77.sh
Option B: Using CLI directly
uvx synth-ai train \
  --config examples/blog_posts/gepa/configs/banking77_gepa_local.toml \
  --backend http://localhost:8000 \
  --poll

Step 4: Monitor Progress

You’ll see real-time output:
🧬 Running GEPA on Banking77
=============================
✅ Backend URL: http://localhost:8000
✅ Task app is healthy

🚀 Starting GEPA training...

proposal[0] train_accuracy=0.65 len=120 tool_rate=0.95 N=30
  🔄 TRANSFORMATION:
    [SYSTEM]: Classify customer banking queries into intents...

Generation 1/15: Best reward=0.75 (75% accuracy)
Generation 2/15: Best reward=0.82 (82% accuracy)
...
✅ GEPA training complete!

Step 5: Query Results

from synth_ai.learning import get_prompt_text, get_scoring_summary

# Get best prompt
best_prompt = get_prompt_text(
    job_id="pl_abc123",
    base_url="http://localhost:8000",
    api_key="sk_...",
    rank=1
)

# Get scoring summary
summary = get_scoring_summary(
    job_id="pl_abc123",
    base_url="http://localhost:8000",
    api_key="sk_..."
)

print(f"Best Train Accuracy: {summary['best_train_accuracy']:.3f}")
print(f"Best Validation Accuracy: {summary['best_validation_accuracy']:.3f}")
print(f"Mean Train Accuracy: {summary['mean_train_accuracy']:.3f}")
print(f"Candidates Tried: {summary['num_candidates_tried']}")

Expected Results

GenerationTypical AccuracyNotes
1 (baseline)60-75%Initial random/baseline prompts
575-80%Early optimization gains
1080-85%Convergence begins
15 (final)85-90%+Optimized prompts on Pareto front

Troubleshooting

❌ “Banking77 task app is not running”

Solution: Start the task app first
./examples/blog_posts/gepa/deploy_banking77_task_app.sh

❌ “Cannot connect to backend”

Solution: Verify backend is running
curl http://localhost:8000/api/health

❌ “GROQ_API_KEY environment variable is required”

Solution: Export your Groq API key
export GROQ_API_KEY="gsk_your_key_here"

❌ Pattern validation failed

Solution: Ensure your config’s initial_prompt.messages uses the {query} wildcard:
[[prompt_learning.initial_prompt.messages]]
role = "user"
pattern = "Customer Query: {query}\n\nClassify this query."

Helper Scripts

ScriptPurpose
deploy_banking77_task_app.shStart Banking77 task app locally
run_gepa_banking77.shRun GEPA optimization with validation checks
test_gepa_local.shQuick test script for local setup
verify_banking77_setup.shComprehensive setup verification
query_prompts_example.pyExample script for querying results

Next Steps