Skip to main content
Filter traced rollout sessions by score thresholds, metadata, and timestamps, then export them to JSONL format ready for supervised fine-tuning (SFT).

Usage

uvx synth-ai filter --config filter.toml
Reads traces from a database, applies filters, and exports matching sessions as conversation pairs (userassistant messages) in JSONL format.

Quick Start

Create a filter config (filter.toml):
[filter]
db = "traces/v3/synth_ai.db"          # Input trace database
output = "datasets/filtered.jsonl"     # Output JSONL file
min_official_score = 0.8               # Only high-scoring traces
limit = 1000                           # Max examples to export
Then run:
uvx synth-ai filter --config filter.toml

Configuration

Basic Options

[filter]
db = "traces/v3/synth_ai.db"              # SQLite or Turso URL
output = "datasets/filtered.jsonl"         # Output path
limit = 1000                               # Max examples (optional)

Score Filtering

[filter]
min_official_score = 0.8     # Minimum task reward/outcome
max_official_score = 1.0     # Maximum task reward/outcome

Judge Score Filtering

[filter.min_judge_scores]
accuracy = 0.9               # Judge "accuracy" >= 0.9
coherence = 0.7              # Judge "coherence" >= 0.7

[filter.max_judge_scores]
verbosity = 0.5              # Judge "verbosity" <= 0.5

Metadata Filtering

[filter]
splits = ["train", "val"]              # Only these splits
task_ids = ["task_123", "task_456"]    # Only these task IDs  
models = ["gpt-4o-mini", "gpt-4o"]     # Only these models

Timestamp Filtering

[filter]
min_created_at = "2024-01-01T00:00:00Z"    # ISO 8601 or Unix timestamp
max_created_at = "2024-12-31T23:59:59Z"     # ISO 8601 or Unix timestamp

Complete Example

[filter]
# Input/output
db = "traces/v3/rollouts.db"
output = "datasets/high_quality.jsonl"
limit = 5000

# Score thresholds
min_official_score = 0.75

# Judge requirements
[filter.min_judge_scores]
accuracy = 0.8
helpfulness = 0.7

[filter.max_judge_scores]
verbosity = 0.6

# Metadata filters
splits = ["train"]
models = ["gpt-4o-mini"]

# Time range
min_created_at = "2024-10-01"
max_created_at = "2024-10-31"

Output Format

The exported JSONL contains one example per line:
{
  "messages": [
    {"role": "user", "content": "What is 2+2?"},
    {"role": "assistant", "content": "2+2 equals 4."}
  ],
  "metadata": {
    "session_id": "abc123",
    "env_name": "math",
    "policy_name": "solver",
    "seed": 42,
    "total_reward": 1.0,
    "model": "gpt-4o-mini",
    "created_at": "2024-10-15T10:30:00Z"
  }
}
Each example represents a single user-assistant exchange extracted from the traced session.

CLI Options

--config PATH    # Required: Path to filter TOML config

Examples

High-Quality Dataset

[filter]
db = "traces/v3/synth_ai.db"
output = "datasets/high_quality.jsonl"
min_official_score = 0.9
limit = 1000

Failed Examples (for analysis)

[filter]
db = "traces/v3/synth_ai.db"
output = "datasets/failures.jsonl"
max_official_score = 0.3
limit = 500

Specific Model Performance

[filter]
db = "traces/v3/synth_ai.db"
output = "datasets/gpt4_traces.jsonl"
models = ["gpt-4o"]
splits = ["val"]

Recent High-Scoring Traces

[filter]
db = "traces/v3/synth_ai.db"
output = "datasets/recent_good.jsonl"
min_official_score = 0.8
min_created_at = "2024-10-01"
limit = 2000

Workflow

Typical data collection → training pipeline:
# 1. Deploy task app and collect traces
uvx synth-ai deploy my-app --runtime uvicorn --trace traces/v3

# 2. Run evaluation with judges
uvx synth-ai eval --config eval.toml --trace-db traces/v3/synth_ai.db

# 3. Filter high-quality traces
uvx synth-ai filter --config filter.toml

# 4. Train on filtered dataset
uvx synth-ai train --config sft.toml --dataset datasets/filtered.jsonl

Troubleshooting

”No traces found in database”

  • Verify db path is correct
  • Check that traces were actually stored (eval/smoke with —trace-db)
  • Ensure database file exists and is readable

”No sessions matched the provided filters”

  • Relax score thresholds (min_official_score)
  • Check metadata filters match your data
  • Remove timestamp constraints
  • Verify judge names match those used during eval

”TOML parser not available”

  • Use Python 3.11+ (has built-in tomllib)
  • Or install: pip install tomli

Empty messages in output

  • Check that traces include message data
  • Verify tracing was enabled during rollouts
  • Some task apps may only have prompt/completion (still exported)

Tips

  • Start permissive: Begin with no filters, then tighten based on data quality
  • Inspect first: Export a small limit (e.g., 100) and review before generating large datasets
  • Multiple filters: Create several configs for different dataset slices (easy/hard, success/failure)
  • Combine datasets: Merge JSONL files with cat filtered_*.jsonl > combined.jsonl

Next Steps