Skip to main content

Overview

Banking77 has both single-stage and multi-stage (pipeline) variants. This guide explains the differences and when to use each.

Single-Stage Banking77

Task: Direct intent classification (77 banking intents)
Config: banking77_gepa_local.toml
Task App: banking77 (single-stage classifier)

Key Characteristics

  • Single prompt template with system and user messages
  • One LLM call per query
  • Pattern-based prompt optimization
  • Direct classification from query to intent

Configuration Structure

[prompt_learning]
algorithm = "gepa"
task_app_url = "https://synth-laboratories-dev--synth-banking77-web-web.modal.run"
task_app_id = "banking77"

[prompt_learning.initial_prompt]
id = "banking77_pattern"
name = "Banking77 Classification Pattern"

[[prompt_learning.initial_prompt.messages]]
role = "system"
pattern = "You are an expert banking assistant. \n\n**Available Banking Intents:**\n{available_intents}\n\n**Task:**\nCall the `banking77_classify` tool with the `intent` parameter set to ONE of the intent labels listed above that best matches the customer query."

[[prompt_learning.initial_prompt.messages]]
role = "user"
pattern = "Customer Query: {query}\n\nClassify this query by calling the tool with the correct intent label from the list above."

[prompt_learning.initial_prompt.wildcards]
query = "REQUIRED"
available_intents = "OPTIONAL"

[prompt_learning.gepa]
env_name = "banking77"
# No modules section - single-stage optimization

Example Config Files

  1. banking77_gepa_local.toml - Local development config
    • Pattern-based prompt with wildcards
    • 30 training seeds, 50 validation seeds
    • Budget: 100 rollouts, 3 generations
  2. banking77_gepa_test.toml - Test/production config
    • Similar structure, optimized for testing
    • 30 training seeds, 20 validation seeds
    • Budget: 1500 rollouts, 10 generations

Typical Results

  • Baseline: 60-75% accuracy
  • After optimization: 85-90%+ accuracy
  • Generations: 3-10 generations typically sufficient

Multi-Stage Banking77 Pipeline

Task: Two-stage pipeline (classifier → calibrator OR query_analyzer → classifier)
Config: banking77_pipeline_gepa_local.toml
Task App: banking77-pipeline (multi-stage pipeline)

Key Characteristics

  • Multiple prompt templates (one per stage)
  • Sequential LLM calls (stage 1 → stage 2)
  • Per-stage optimization with module constraints
  • Pipeline-level evaluation (end-to-end performance)

Two Pipeline Variants

Variant 1: Classifier → Calibrator

Stages:
  1. Classifier: Initial intent classification
  2. Calibrator: Review and refine the classifier’s suggestion
Config: banking77_pipeline_gepa_local.toml
[prompt_learning.initial_prompt.metadata]
pipeline_modules = [
  { name = "classifier", instruction_text = "You are an expert banking assistant. Classify the customer query into one of the known Banking77 intents. Always return the label using the `banking77_classify` tool.", few_shots = [] },
  { name = "calibrator", instruction_text = "You refine intent predictions from an upstream classifier. Review the suggested intent alongside the original query. If the suggestion is valid, confirm it. Otherwise, choose the closest Banking77 intent. Always respond via the `banking77_classify` tool with the final label.", few_shots = [] }
]

[[prompt_learning.gepa.modules]]
module_id = "classifier"
max_instruction_slots = 3
max_tokens = 1024
allowed_tools = ["banking77_classify"]

[[prompt_learning.gepa.modules]]
module_id = "calibrator"
max_instruction_slots = 3
max_tokens = 1024
allowed_tools = ["banking77_classify"]

Variant 2: Query Analyzer → Classifier

Stages:
  1. Query Analyzer: Extract key information from query
  2. Classifier: Classify based on analyzed information
Config: banking77_pipeline_gepa_test.toml
[prompt_learning.initial_prompt.metadata]
pipeline_modules = [
  { name = "query_analyzer", instruction_text = "You analyze customer banking queries to extract key information. Identify the main topic and any specific details mentioned.", few_shots = [] },
  { name = "classifier", instruction_text = "You are an expert banking assistant that classifies customer queries into banking intents. Given a customer message, respond with exactly one intent label from the provided list using the `banking77_classify` tool.", few_shots = [] }
]

[[prompt_learning.gepa.modules]]
module_id = "query_analyzer"
max_instruction_slots = 2
max_tokens = 512
allowed_tools = []  # Analyzer doesn't use tools

[[prompt_learning.gepa.modules]]
module_id = "classifier"
max_instruction_slots = 3
max_tokens = 1024
allowed_tools = ["banking77_classify"]

Configuration Structure

[prompt_learning]
algorithm = "gepa"
task_app_url = "https://synth-laboratories-dev--synth-banking77-pipeline-web-web.modal.run"
task_app_id = "banking77-pipeline"

[prompt_learning.initial_prompt]
id = "banking77_pipeline_baseline"
name = "Banking77 Pipeline Baseline"

[[prompt_learning.initial_prompt.messages]]
role = "system"
pattern = "Pipeline placeholder message. Actual stage instructions are provided via metadata.pipeline_modules."
order = 0

[prompt_learning.initial_prompt.metadata]
pipeline_modules = [
  { name = "classifier", instruction_text = "...", few_shots = [] },
  { name = "calibrator", instruction_text = "...", few_shots = [] }
]

# Multi-stage module configuration
[[prompt_learning.gepa.modules]]
module_id = "classifier"
max_instruction_slots = 3
max_tokens = 1024
allowed_tools = ["banking77_classify"]

[[prompt_learning.gepa.modules]]
module_id = "calibrator"
max_instruction_slots = 3
max_tokens = 1024
allowed_tools = ["banking77_classify"]

Example Config Files

  1. banking77_pipeline_gepa_local.toml - Local development (classifier → calibrator)
    • Spec-based proposer (proposer_type = "spec")
    • System spec integration
    • Budget: 500 rollouts, 5 generations
  2. banking77_pipeline_gepa_test.toml - Test/production (query_analyzer → classifier)
    • DSPy proposer (proposer_type = "dspy")
    • Budget: 2000 rollouts, 10 generations
  3. banking77_pipeline_gepa.toml - Alternative config (classifier → calibrator)
    • DSPy proposer
    • Budget: 500 rollouts, 8 generations

Key Differences

AspectSingle-StageMulti-Stage
Task Appbanking77banking77-pipeline
LLM Calls1 per query2+ per query (sequential)
Prompt StructureSingle templateMultiple templates (one per stage)
ConfigurationNo modules section[[prompt_learning.gepa.modules]] required
Initial PromptDirect messagesPlaceholder + metadata.pipeline_modules
OptimizationSingle prompt evolutionPer-stage prompt evolution
EvaluationDirect accuracyEnd-to-end pipeline accuracy
ComplexitySimplerMore complex (module constraints)

Configuration Differences

Single-Stage: Initial Prompt

[prompt_learning.initial_prompt]
id = "banking77_pattern"
name = "Banking77 Classification Pattern"

[[prompt_learning.initial_prompt.messages]]
role = "system"
pattern = "You are an expert banking assistant..."

[[prompt_learning.initial_prompt.messages]]
role = "user"
pattern = "Customer Query: {query}..."

[prompt_learning.initial_prompt.wildcards]
query = "REQUIRED"
available_intents = "OPTIONAL"

# No modules section

Multi-Stage: Initial Prompt

[prompt_learning.initial_prompt]
id = "banking77_pipeline_baseline"
name = "Banking77 Pipeline Baseline"

[[prompt_learning.initial_prompt.messages]]
role = "system"
pattern = "Pipeline placeholder message. Actual stage instructions are provided via metadata.pipeline_modules."
order = 0

[prompt_learning.initial_prompt.metadata]
pipeline_modules = [
  { name = "classifier", instruction_text = "...", few_shots = [] },
  { name = "calibrator", instruction_text = "...", few_shots = [] }
]

# Module configuration required
[[prompt_learning.gepa.modules]]
module_id = "classifier"
max_instruction_slots = 3
max_tokens = 1024
allowed_tools = ["banking77_classify"]

[[prompt_learning.gepa.modules]]
module_id = "calibrator"
max_instruction_slots = 3
max_tokens = 1024
allowed_tools = ["banking77_classify"]

When to Use Each

Use Single-Stage When:

  • ✅ You have a straightforward classification task
  • ✅ You want simpler configuration and faster optimization
  • ✅ Single LLM call is sufficient
  • ✅ You’re prototyping or testing

Use Multi-Stage When:

  • ✅ You need sequential processing (analyze → classify)
  • ✅ You want refinement/calibration stages
  • ✅ Different stages have different constraints (tools, tokens)
  • ✅ You need per-stage optimization control

Example Config Files Location

Single-Stage Configs

  • synth-ai/examples/blog_posts/gepa/configs/banking77_gepa_local.toml
  • synth-ai/examples/blog_posts/gepa/configs/banking77_gepa_test.toml

Multi-Stage Configs

  • synth-ai/examples/blog_posts/gepa/configs/banking77_pipeline_gepa_local.toml (classifier → calibrator)
  • synth-ai/examples/blog_posts/gepa/configs/banking77_pipeline_gepa_test.toml (query_analyzer → classifier)
  • synth-ai/examples/gepa/banking77_pipeline_gepa.toml (classifier → calibrator)

Running the Examples

Single-Stage

uvx synth-ai train \
  --config examples/blog_posts/gepa/configs/banking77_gepa_local.toml \
  --backend http://localhost:8000 \
  --poll

Multi-Stage

uvx synth-ai train \
  --config examples/blog_posts/gepa/configs/banking77_pipeline_gepa_local.toml \
  --backend http://localhost:8000 \
  --poll

Next Steps