Skip to main content

Overview

Task Apps are HTTP services that evaluate prompts for Synth AI’s optimization algorithms. They implement a simple contract: receive rollout requests, execute episodes, and return rewards.

High-Level Architecture

Key Components:
  1. Task App - Your HTTP service (any language)
  2. Synth AI Backend - Coordinates optimization
  3. GEPA Optimizer - Evolutionary search engine
  4. Interceptor - Prompt transformation layer
  5. LLM Provider - Inference endpoint

Task App Contract

Required Endpoints

GET /health
  • Liveness probe (unauthenticated OK)
  • Returns: {"healthy": true}
GET /task_info
  • Dataset metadata (authenticated)
  • Returns: Task description, seeds, rubric, inference mode
POST /rollout
  • Execute one episode (authenticated)
  • Input: run_id, env.seed, policy.config
  • Returns: Trajectories, metrics, rewards

Request Flow

Rollout Request Structure

{
  "run_id": "unique-run-id",
  "env": {
    "seed": 0,
    "config": {}
  },
  "policy": {
    "config": {
      "model": "gpt-4o-mini",
      "inference_url": "https://interceptor-url/...",
      "prompt_template": {...}  // Baseline only, not optimized
    }
  }
}
Key Point: Task App receives baseline prompts only. Optimized prompts are substituted by the Interceptor.

Response Flow

Rollout Response Structure

{
  "run_id": "unique-run-id",
  "trajectories": [{
    "env_id": "task::train::0",
    "policy_id": "policy-1",
    "steps": [{
      "obs": {"query": "...", "index": 0},
      "tool_calls": [...],
      "reward": 1.0,
      "done": true,
      "info": {"expected": "...", "predicted": "...", "correct": true}
    }],
    "length": 1,
    "inference_url": "..."
  }],
  "metrics": {
    "episode_rewards": [1.0],
    "reward_mean": 1.0,
    "num_steps": 1,
    "num_episodes": 1,
    "outcome_score": 1.0
  },
  "aborted": false,
  "ops_executed": 1
}

Interceptor Pattern (Critical)

How It Works

Step 1: Task App Receives Baseline
Synth AI Backend → Task App
  POST /rollout
  {
    "policy": {
      "config": {
        "inference_url": "https://interceptor/v1/trial-123",
        "prompt_template": {...}  // Baseline prompt
      }
    }
  }
Step 2: Task App Calls LLM
Task App → Interceptor
  POST /chat/completions
  {
    "model": "gpt-4o-mini",
    "messages": [...]  // Baseline messages
  }
Step 3: Interceptor Substitutes
Interceptor:
  1. Receives baseline messages
  2. Looks up registered transformation for trial-123
  3. Applies transformation to messages
  4. Forwards to actual LLM provider
Step 4: LLM Response
LLM Provider → Interceptor → Task App
  {
    "choices": [{
      "message": {...},
      "tool_calls": [...]
    }]
  }
Key Benefits:
  • ✅ Task App never sees optimized prompts
  • ✅ Prompts stay secure in backend
  • ✅ No Task App code changes needed
  • ✅ Pattern-based transformations

GEPA Optimization Flow

Complete Optimization Cycle

Phase 1: Job Submission
User → Synth AI Backend
  POST /prompt-learning/online/jobs
  {
    "algorithm": "gepa",
    "config_body": {
      "task_app_url": "https://your-task-app.com",
      "task_app_api_key": "...",
      ...
    }
  }
Phase 2: Pattern Validation
Backend:
  1. Start Interceptor
  2. Fetch baseline messages from Task App
  3. Validate pattern matches initial_template
Phase 3: Population Initialization
Backend:
  1. Create baseline transformation
  2. Generate mutations
  3. Initialize population (20-30 variants)
Phase 4: Evaluation Loop
For each generation:
  For each candidate:
    1. Register transformation with Interceptor
    2. Backend → Task App: Rollout request (baseline)
    3. Task App → Interceptor: LLM call
    4. Interceptor: Substitute optimized prompt
    5. LLM Provider → Interceptor → Task App: Response
    6. Task App → Backend: Trajectory with reward
    7. Backend: Update Pareto archive
Phase 5: Selection & Mutation
Backend:
  1. Select parents from Pareto archive
  2. Generate mutations/crossover
  3. Next generation

Deployment Architectures

Option 1: Embedded Task App

Your Application
├── Task App Logic (in-process)
└── Synth AI SDK Integration
    └── InProcessTaskApp
Use Case: Python applications, quick prototyping

Option 2: Standalone Task App

Your Server
└── Task App (HTTP service)
    ├── Any language (Rust, Go, TypeScript, Python)
    └── Exposed via tunnel or direct URL
Use Case: Production deployments, polyglot implementations

Option 3: Cloud-Deployed Task App

Cloud Platform (Render, Fly.io, etc.)
└── Task App (HTTP service)
    └── Public HTTPS URL
Use Case: Production, scalable deployments

Authentication Flow

Two Separate Auth Flows

1. Task App Authentication (X-API-Key)
Synth AI Backend → Task App
  Headers: X-API-Key: <ENVIRONMENT_API_KEY>

Purpose: Authenticate Synth AI to your Task App
2. LLM Provider Authentication (Authorization: Bearer)
Task App → LLM Provider
  Headers: Authorization: Bearer <LLM_API_KEY>

Purpose: Authenticate Task App to LLM
Important: These are separate. Task App manages LLM auth internally.

Data Flow: Complete Example

Banking77 Classification Example

1. Job Submission
import os
from synth_ai.sdk import PolicyOptimizationJob

client = PromptLearningClient(api_key=os.environ["SYNTH_API_KEY"])

# Create job from config
job = await client.create_job(config={
    "algorithm": "gepa",
    "task_app_url": "https://my-task-app.com",
    "task_app_api_key": "secret-key"
})
await client.start_job(job["id"])
2. Backend Validates Pattern
Backend → Task App: GET /task_info?seed=0
Task App → Backend: Task metadata

Backend → Task App: POST /rollout (baseline)
Task App → Interceptor: LLM call
Interceptor → LLM: Optimized prompt
LLM → Task App: Response
Task App → Backend: Reward (baseline score)
3. GEPA Optimization
For each generation:
  For each candidate:
    Backend registers transformation
    Backend → Task App: Rollout (baseline)
    Task App → Interceptor: LLM call
    Interceptor substitutes optimized prompt
    Task App computes reward
    Backend updates archive
4. Job Completion
Backend → User: Job status = "succeeded"
Metadata includes:
  - Best prompt transformation
  - Best score
  - Pareto archive

Error Handling

Common Scenarios

Task App Unreachable:
  • Backend retries with exponential backoff
  • Job fails after max retries
LLM Call Failure:
  • Task App returns 502 Bad Gateway
  • Backend marks rollout as failed
  • Continues with other candidates
Invalid Response Format:
  • Backend validates response structure
  • Marks rollout as failed if invalid
Timeout:
  • Task App should respond within timeout_seconds
  • Backend cancels long-running rollouts

Performance Considerations

Throughput

Bottlenecks:
  1. LLM inference latency (~1-3s per rollout)
  2. Network latency (Task App ↔ Backend)
  3. Task App processing time
Optimization:
  • Parallel rollouts (max_concurrent)
  • Minibatch gating (GEPA)
  • Efficient Task App implementation

Scalability

Task App:
  • Stateless design (scales horizontally)
  • Efficient dataset loading
  • Connection pooling for LLM calls
Backend:
  • Handles multiple jobs concurrently
  • Manages Interceptor instances
  • Efficient archive updates

Security Considerations

API Keys:
  • ENVIRONMENT_API_KEY - Task App authentication
  • SYNTH_API_KEY - Backend authentication
  • LLM_API_KEY - LLM provider authentication
Network Security:
  • HTTPS for all connections
  • Tunnel options for local development
  • API key validation
Prompt Security:
  • Prompts never sent to Task Apps
  • Transformations registered securely
  • No prompt leakage

Next Steps