Skip to main content
Prompt learning task apps look just like RL/SFT task apps from the API’s perspective: they expose the standard FastAPI endpoints documented in synth_ai/task/server.py, require X-API-Key authentication, and emit traces/events for downstream tooling. The difference is how the app is used:
  • The prompt-learning CLI (synth_ai/train/cli.py) never uses the app during training; instead, the backend orchestrator repeatedly calls the /rollout endpoint (and related routes) to evaluate prompt candidates.
  • The SDK/CLI health-checks the task app (check_task_app_health) before submitting jobs, so your app must respond quickly to /health and /task_info with the expected schema.

Core Responsibilities

  1. Expose the standard endpoints (root, /health, /info, /task_info, /rollout) exactly as implemented by TaskAppConfig. Prompt-learning configs reference these endpoints via task_app_url.
  2. Provide rich TaskInfo metadata describing the environment, dataset splits, and capabilities so the optimizer knows what dataset to pull from and how to score candidates.
  3. Emit traces + events through the tracing_v3 pipeline (enable TASKAPP_TRACING_ENABLED, TASKAPP_SFT_OUTPUT_DIR, and DB environment variables). The CLI fetches prompt-learning events (prompt.learning.*) after jobs complete, so write the relevant events/metrics during /rollout.
  4. Support automated rollouts: /rollout should process a RolloutRequest and return RolloutResponse with consistent pipeline_metadata.inference_url + per-step info.meta.inference_url (same requirement as RL; see synth_ai.task.validators).

Authentication

Same as other task apps: every protected route enforces X-API-Key / Authorization: Bearer headers. Use require_api_key_dependency from synth_ai.task.server.

Endpoint Contract Summary

(All implemented automatically when you create a TaskAppConfig; include custom logic in your config factory if needed.)
  • / – basic liveness probe: { "status": "ok", "service": "<task-app-id>" }
  • /health – verifies the API key and returns:
    {
      "healthy": true,
      "auth": { "required": true, "expected_prefix": "sk-..." }
    }
    
  • /info – returns TaskInfo metadata: { "service": { "task": {...} }, "dataset": {...}, "rubrics": {...}, "inference": {...}, "limits": {...} }
  • /task_info – without seeds: { "taskset": {...} }; with seeds: TaskInfo or list of TaskInfo describing each requested instance.
  • /rollout – accepts RolloutRequest (from prompt-learning orchestrator) and returns RolloutResponse with trajectories, metrics, trace, pipeline_metadata. Ensure pipeline_metadata.inference_url has a ?cid= token and every trajectory.steps[*].info.meta.inference_url mirrors it.

Tracing and Events

  • Set TASKAPP_TRACING_ENABLED=1 when hosting the app so v3 traces capture every rollout.
  • TASKAPP_SFT_OUTPUT_DIR (or SFT_OUTPUT_DIR) controls where, if anywhere, the app writes raw JSONL. Prompt learning mainly relies on the trace DB, but following the same pattern as RL/SFT simplifies reuse.
  • Emit prompt-learning-specific events (e.g., prompt.learning.progress, prompt.learning.final.results) via your app logic so the CLI can build summaries.
Prompt learning often runs long evaluations, so hosting on Modal is common:
  • Provide a Modal entry script (same format as RL/SFT) and register secrets (ENVIRONMENT_API_KEY, vendor keys) so uvx synth-ai deploy --runtime modal or the CLI’s automatic deploy path can boot the app.
  • Ensure the Modal deployment exposes stable task_app_url values; the prompt-learning config references this URL directly.

Best Practices

  • Deterministic seeds: implement /task_info so specific seeds map to reproducible task instances (same as RL); GEPA/MIPRO depend on consistent evaluation.
  • Comprehensive metadata: fill dataset, rubric, limits, and task_metadata fields so the optimizer can display context and filter results.
  • Robust error handling: /rollout should gracefully handle invalid prompts or tool calls and report meaningful metrics.details so you can debug candidate failures.