Filter

Filter traced rollout sessions by score thresholds, metadata, and timestamps, then export them to JSONL format ready for supervised fine-tuning (SFT).

Usage

uvx synth-ai filter --config filter.toml

Reads traces from a database, applies filters, and exports matching sessions as conversation pairs (user → assistant messages) in JSONL format.

Quick Start

Create a filter config (filter.toml):

[filter]
db = "traces/v3/synth_ai.db"          # Input trace database
output = "datasets/filtered.jsonl"     # Output JSONL file
min_official_score = 0.8               # Only high-scoring traces
limit = 1000                           # Max examples to export

Then run:

uvx synth-ai filter --config filter.toml

Configuration

Basic Options

[filter]
db = "traces/v3/synth_ai.db"              # SQLite or Turso URL
output = "datasets/filtered.jsonl"         # Output path
limit = 1000                               # Max examples (optional)

Score Filtering

[filter]
min_official_score = 0.8     # Minimum task reward/outcome
max_official_score = 1.0     # Maximum task reward/outcome

Judge Score Filtering

[filter.min_judge_scores]
accuracy = 0.9               # Judge "accuracy" >= 0.9
coherence = 0.7              # Judge "coherence" >= 0.7

[filter.max_judge_scores]
verbosity = 0.5              # Judge "verbosity" <= 0.5

Metadata Filtering

[filter]
splits = ["train", "val"]              # Only these splits
task_ids = ["task_123", "task_456"]    # Only these task IDs  
models = ["gpt-4o-mini", "gpt-4o"]     # Only these models

Timestamp Filtering

[filter]
min_created_at = "2024-01-01T00:00:00Z"    # ISO 8601 or Unix timestamp
max_created_at = "2024-12-31T23:59:59Z"     # ISO 8601 or Unix timestamp

Complete Example

[filter]
# Input/output
db = "traces/v3/rollouts.db"
output = "datasets/high_quality.jsonl"
limit = 5000

# Score thresholds
min_official_score = 0.75

# Judge requirements
[filter.min_judge_scores]
accuracy = 0.8
helpfulness = 0.7

[filter.max_judge_scores]
verbosity = 0.6

# Metadata filters
splits = ["train"]
models = ["gpt-4o-mini"]

# Time range
min_created_at = "2024-10-01"
max_created_at = "2024-10-31"

Output Format

The exported JSONL contains one example per line:

{
  "messages": [
    {"role": "user", "content": "What is 2+2?"},
    {"role": "assistant", "content": "2+2 equals 4."}
  ],
  "metadata": {
    "session_id": "abc123",
    "env_name": "math",
    "policy_name": "solver",
    "seed": 42,
    "total_reward": 1.0,
    "model": "gpt-4o-mini",
    "created_at": "2024-10-15T10:30:00Z"
  }
}

Each example represents a single user-assistant exchange extracted from the traced session.

CLI Options

--config PATH    # Required: Path to filter TOML config

Examples

High-Quality Dataset

[filter]
db = "traces/v3/synth_ai.db"
output = "datasets/high_quality.jsonl"
min_official_score = 0.9
limit = 1000

Failed Examples (for analysis)

[filter]
db = "traces/v3/synth_ai.db"
output = "datasets/failures.jsonl"
max_official_score = 0.3
limit = 500

Specific Model Performance

[filter]
db = "traces/v3/synth_ai.db"
output = "datasets/gpt4_traces.jsonl"
models = ["gpt-4o"]
splits = ["val"]

Recent High-Scoring Traces

[filter]
db = "traces/v3/synth_ai.db"
output = "datasets/recent_good.jsonl"
min_official_score = 0.8
min_created_at = "2024-10-01"
limit = 2000

Workflow

Typical data collection → training pipeline:

# 1. Deploy task app and collect traces
uvx synth-ai deploy my-app --runtime uvicorn --trace traces/v3

# 2. Run evaluation with judges
uvx synth-ai eval --config eval.toml --trace-db traces/v3/synth_ai.db

# 3. Filter high-quality traces
uvx synth-ai filter --config filter.toml

# 4. Train on filtered dataset
uvx synth-ai train --config sft.toml --dataset datasets/filtered.jsonl

Troubleshooting

”No traces found in database”

Verify db path is correct
Check that traces were actually stored (eval/smoke with —trace-db)
Ensure database file exists and is readable

”No sessions matched the provided filters”

Relax score thresholds (min_official_score)
Check metadata filters match your data
Remove timestamp constraints
Verify judge names match those used during eval

”TOML parser not available”

Use Python 3.11+ (has built-in tomllib)
Or install: pip install tomli

Empty `messages` in output

Check that traces include message data
Verify tracing was enabled during rollouts
Some task apps may only have prompt/completion (still exported)

Tips

Start permissive: Begin with no filters, then tighten based on data quality
Inspect first: Export a small limit (e.g., 100) and review before generating large datasets
Multiple filters: Create several configs for different dataset slices (easy/hard, success/failure)
Combine datasets: Merge JSONL files with cat filtered_*.jsonl > combined.jsonl

Get Started

CLI Commands

Fine-Tuning

Reinforcement Learning

CLI Commands

Usage

Quick Start

Configuration

Basic Options

Score Filtering

Judge Score Filtering

Metadata Filtering

Timestamp Filtering

Complete Example

Output Format

CLI Options

Examples

High-Quality Dataset

Failed Examples (for analysis)

Specific Model Performance

Recent High-Scoring Traces

Workflow

Troubleshooting

”No traces found in database”

”No sessions matched the provided filters”

”TOML parser not available”

Empty `messages` in output

Tips

Next Steps

Get Started

CLI Commands

Fine-Tuning

Reinforcement Learning

CLI Commands

​Usage

​Quick Start

​Configuration

​Basic Options

​Score Filtering

​Judge Score Filtering

​Metadata Filtering

​Timestamp Filtering

​Complete Example

​Output Format

​CLI Options

​Examples

​High-Quality Dataset

​Failed Examples (for analysis)

​Specific Model Performance

​Recent High-Scoring Traces

​Workflow

​Troubleshooting

​”No traces found in database”

​”No sessions matched the provided filters”

​”TOML parser not available”

​Empty messages in output

​Tips

​Next Steps

Usage

Quick Start

Configuration

Basic Options

Score Filtering

Judge Score Filtering

Metadata Filtering

Timestamp Filtering

Complete Example

Output Format

CLI Options

Examples

High-Quality Dataset

Failed Examples (for analysis)

Specific Model Performance

Recent High-Scoring Traces

Workflow

Troubleshooting

”No traces found in database”

”No sessions matched the provided filters”

”TOML parser not available”

Empty `messages` in output

Tips

Next Steps