Banking77 GEPA Examples: Single-Stage vs Multi-Stage

Overview

Banking77 has both single-stage and multi-stage (pipeline) variants. This guide explains the differences and when to use each.

Single-Stage Banking77

Task: Direct intent classification (77 banking intents)
Config: banking77_gepa_local.toml
Task App: banking77 (single-stage classifier)

Key Characteristics

Single prompt template with system and user messages
One LLM call per query
Pattern-based prompt optimization
Direct classification from query to intent

Configuration Structure

[prompt_learning]
algorithm = "gepa"
task_app_url = "https://synth-laboratories-dev--synth-banking77-web-web.modal.run"
task_app_id = "banking77"

[prompt_learning.initial_prompt]
id = "banking77_pattern"
name = "Banking77 Classification Pattern"

[[prompt_learning.initial_prompt.messages]]
role = "system"
pattern = "You are an expert banking assistant. \n\n**Available Banking Intents:**\n{available_intents}\n\n**Task:**\nCall the `banking77_classify` tool with the `intent` parameter set to ONE of the intent labels listed above that best matches the customer query."

[[prompt_learning.initial_prompt.messages]]
role = "user"
pattern = "Customer Query: {query}\n\nClassify this query by calling the tool with the correct intent label from the list above."

[prompt_learning.initial_prompt.wildcards]
query = "REQUIRED"
available_intents = "OPTIONAL"

[prompt_learning.gepa]
env_name = "banking77"
# No modules section - single-stage optimization

Example Config Files

banking77_gepa_local.toml - Local development config
- Pattern-based prompt with wildcards
- 30 training seeds, 50 validation seeds
- Budget: 100 rollouts, 3 generations
banking77_gepa_test.toml - Test/production config
- Similar structure, optimized for testing
- 30 training seeds, 20 validation seeds
- Budget: 1500 rollouts, 10 generations

Typical Results

Baseline: 60-75% accuracy
After optimization: 85-90%+ accuracy
Generations: 3-10 generations typically sufficient

Multi-Stage Banking77 Pipeline

Task: Two-stage pipeline (classifier → calibrator OR query_analyzer → classifier)
Config: banking77_pipeline_gepa_local.toml
Task App: banking77-pipeline (multi-stage pipeline)

Key Characteristics

Multiple prompt templates (one per stage)
Sequential LLM calls (stage 1 → stage 2)
Per-stage optimization with module constraints
Pipeline-level evaluation (end-to-end performance)

Two Pipeline Variants

Variant 1: Classifier → Calibrator

Stages:

Classifier: Initial intent classification
Calibrator: Review and refine the classifier’s suggestion

Config: banking77_pipeline_gepa_local.toml

[prompt_learning.initial_prompt.metadata]
pipeline_modules = [
  { name = "classifier", instruction_text = "You are an expert banking assistant. Classify the customer query into one of the known Banking77 intents. Always return the label using the `banking77_classify` tool.", few_shots = [] },
  { name = "calibrator", instruction_text = "You refine intent predictions from an upstream classifier. Review the suggested intent alongside the original query. If the suggestion is valid, confirm it. Otherwise, choose the closest Banking77 intent. Always respond via the `banking77_classify` tool with the final label.", few_shots = [] }
]

[[prompt_learning.gepa.modules]]
module_id = "classifier"
max_instruction_slots = 3
max_tokens = 1024
allowed_tools = ["banking77_classify"]

[[prompt_learning.gepa.modules]]
module_id = "calibrator"
max_instruction_slots = 3
max_tokens = 1024
allowed_tools = ["banking77_classify"]

Variant 2: Query Analyzer → Classifier

Stages:

Query Analyzer: Extract key information from query
Classifier: Classify based on analyzed information

Config: banking77_pipeline_gepa_test.toml

[prompt_learning.initial_prompt.metadata]
pipeline_modules = [
  { name = "query_analyzer", instruction_text = "You analyze customer banking queries to extract key information. Identify the main topic and any specific details mentioned.", few_shots = [] },
  { name = "classifier", instruction_text = "You are an expert banking assistant that classifies customer queries into banking intents. Given a customer message, respond with exactly one intent label from the provided list using the `banking77_classify` tool.", few_shots = [] }
]

[[prompt_learning.gepa.modules]]
module_id = "query_analyzer"
max_instruction_slots = 2
max_tokens = 512
allowed_tools = []  # Analyzer doesn't use tools

[[prompt_learning.gepa.modules]]
module_id = "classifier"
max_instruction_slots = 3
max_tokens = 1024
allowed_tools = ["banking77_classify"]

Configuration Structure

[prompt_learning]
algorithm = "gepa"
task_app_url = "https://synth-laboratories-dev--synth-banking77-pipeline-web-web.modal.run"
task_app_id = "banking77-pipeline"

[prompt_learning.initial_prompt]
id = "banking77_pipeline_baseline"
name = "Banking77 Pipeline Baseline"

[[prompt_learning.initial_prompt.messages]]
role = "system"
pattern = "Pipeline placeholder message. Actual stage instructions are provided via metadata.pipeline_modules."
order = 0

[prompt_learning.initial_prompt.metadata]
pipeline_modules = [
  { name = "classifier", instruction_text = "...", few_shots = [] },
  { name = "calibrator", instruction_text = "...", few_shots = [] }
]

# Multi-stage module configuration
[[prompt_learning.gepa.modules]]
module_id = "classifier"
max_instruction_slots = 3
max_tokens = 1024
allowed_tools = ["banking77_classify"]

[[prompt_learning.gepa.modules]]
module_id = "calibrator"
max_instruction_slots = 3
max_tokens = 1024
allowed_tools = ["banking77_classify"]

Example Config Files

banking77_pipeline_gepa_local.toml - Local development (classifier → calibrator)
- Spec-based proposer (proposer_type = "spec")
- System spec integration
- Budget: 500 rollouts, 5 generations
banking77_pipeline_gepa_test.toml - Test/production (query_analyzer → classifier)
- DSPy proposer (proposer_type = "dspy")
- Budget: 2000 rollouts, 10 generations
banking77_pipeline_gepa.toml - Alternative config (classifier → calibrator)
- DSPy proposer
- Budget: 500 rollouts, 8 generations

Key Differences

Aspect	Single-Stage	Multi-Stage
Task App	`banking77`	`banking77-pipeline`
LLM Calls	1 per query	2+ per query (sequential)
Prompt Structure	Single template	Multiple templates (one per stage)
Configuration	No `modules` section	`[[prompt_learning.gepa.modules]]` required
Initial Prompt	Direct messages	Placeholder + `metadata.pipeline_modules`
Optimization	Single prompt evolution	Per-stage prompt evolution
Evaluation	Direct accuracy	End-to-end pipeline accuracy
Complexity	Simpler	More complex (module constraints)

Configuration Differences

Single-Stage: Initial Prompt

[prompt_learning.initial_prompt]
id = "banking77_pattern"
name = "Banking77 Classification Pattern"

[[prompt_learning.initial_prompt.messages]]
role = "system"
pattern = "You are an expert banking assistant..."

[[prompt_learning.initial_prompt.messages]]
role = "user"
pattern = "Customer Query: {query}..."

[prompt_learning.initial_prompt.wildcards]
query = "REQUIRED"
available_intents = "OPTIONAL"

# No modules section

Multi-Stage: Initial Prompt

[prompt_learning.initial_prompt]
id = "banking77_pipeline_baseline"
name = "Banking77 Pipeline Baseline"

[[prompt_learning.initial_prompt.messages]]
role = "system"
pattern = "Pipeline placeholder message. Actual stage instructions are provided via metadata.pipeline_modules."
order = 0

[prompt_learning.initial_prompt.metadata]
pipeline_modules = [
  { name = "classifier", instruction_text = "...", few_shots = [] },
  { name = "calibrator", instruction_text = "...", few_shots = [] }
]

# Module configuration required
[[prompt_learning.gepa.modules]]
module_id = "classifier"
max_instruction_slots = 3
max_tokens = 1024
allowed_tools = ["banking77_classify"]

[[prompt_learning.gepa.modules]]
module_id = "calibrator"
max_instruction_slots = 3
max_tokens = 1024
allowed_tools = ["banking77_classify"]

When to Use Each

Use Single-Stage When:

✅ You have a straightforward classification task
✅ You want simpler configuration and faster optimization
✅ Single LLM call is sufficient
✅ You’re prototyping or testing

Use Multi-Stage When:

✅ You need sequential processing (analyze → classify)
✅ You want refinement/calibration stages
✅ Different stages have different constraints (tools, tokens)
✅ You need per-stage optimization control

Example Config Files Location

Single-Stage Configs

synth-ai/examples/blog_posts/gepa/configs/banking77_gepa_local.toml
synth-ai/examples/blog_posts/gepa/configs/banking77_gepa_test.toml

Multi-Stage Configs

synth-ai/examples/blog_posts/gepa/configs/banking77_pipeline_gepa_local.toml (classifier → calibrator)
synth-ai/examples/blog_posts/gepa/configs/banking77_pipeline_gepa_test.toml (query_analyzer → classifier)
synth-ai/examples/gepa/banking77_pipeline_gepa.toml (classifier → calibrator)

Running the Examples

Single-Stage

uvx synth-ai train \
  --config examples/blog_posts/gepa/configs/banking77_gepa_local.toml \
  --backend http://localhost:8000 \
  --poll

Multi-Stage

uvx synth-ai train \
  --config examples/blog_posts/gepa/configs/banking77_pipeline_gepa_local.toml \
  --backend http://localhost:8000 \
  --poll

Next Steps

Banking77 Single-Stage Guide – Complete walkthrough
Configuration Reference – All parameters explained
Algorithm Comparison – GEPA vs MIPRO

Get Started

SDK Reference

Fine-Tuning

Reinforcement Learning

Prompt Learning

CLI Commands

Banking77 GEPA Examples: Single-Stage vs Multi-Stage

Overview

Single-Stage Banking77

Key Characteristics

Configuration Structure

Example Config Files

Typical Results

Multi-Stage Banking77 Pipeline

Key Characteristics

Two Pipeline Variants

Variant 1: Classifier → Calibrator

Variant 2: Query Analyzer → Classifier

Configuration Structure

Example Config Files

Key Differences

Configuration Differences

Single-Stage: Initial Prompt

Multi-Stage: Initial Prompt

When to Use Each

Use Single-Stage When:

Use Multi-Stage When:

Example Config Files Location

Single-Stage Configs

Multi-Stage Configs

Running the Examples

Single-Stage

Multi-Stage

Next Steps

Get Started

SDK Reference

Fine-Tuning

Reinforcement Learning

Prompt Learning

CLI Commands

​Overview

​Single-Stage Banking77

​Key Characteristics

​Configuration Structure

​Example Config Files

​Typical Results

​Multi-Stage Banking77 Pipeline

​Key Characteristics

​Two Pipeline Variants

​Variant 1: Classifier → Calibrator

​Variant 2: Query Analyzer → Classifier

​Configuration Structure

​Example Config Files

​Key Differences

​Configuration Differences

​Single-Stage: Initial Prompt

​Multi-Stage: Initial Prompt

​When to Use Each

​Use Single-Stage When:

​Use Multi-Stage When:

​Example Config Files Location

​Single-Stage Configs

​Multi-Stage Configs

​Running the Examples

​Single-Stage

​Multi-Stage

​Next Steps

Overview

Single-Stage Banking77

Key Characteristics

Configuration Structure

Example Config Files

Typical Results

Multi-Stage Banking77 Pipeline

Key Characteristics

Two Pipeline Variants

Variant 1: Classifier → Calibrator

Variant 2: Query Analyzer → Classifier

Configuration Structure

Example Config Files

Key Differences

Configuration Differences

Single-Stage: Initial Prompt

Multi-Stage: Initial Prompt

When to Use Each

Use Single-Stage When:

Use Multi-Stage When:

Example Config Files Location

Single-Stage Configs

Multi-Stage Configs

Running the Examples

Single-Stage

Multi-Stage

Next Steps