RLM: Massive Context Graphs

What is RLM?

RLM (Recursive Language Model) graphs handle massive context (1M+ tokens) that’s too large to fit in prompts. Instead of interpolating huge documents directly into LLM calls, RLM graphs:

Materialize context to a searchable store
Search via fast local tools (~1ms grep/search)
Extract relevant snippets for LLM processing
Synthesize the final answer from found information

This pattern enables working with entire codebases, document corpora, or datasets that would exceed any model’s context window.

When to use RLM

Scenario	Use RLM?
Context < 100K tokens	No - use `graph_type: "policy"`
Context 100K-500K tokens	Maybe - depends on model limits
Context > 500K tokens	Yes - use `graph_type: "rlm"`
RAG over large corpus	Yes
Codebase analysis	Yes
Multi-document QA	Yes

Quick start

curl -X POST $HOST/api/graphgen/jobs \
  -H "Authorization: Bearer $SYNTH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "dataset": {
      "metadata": { "name": "financial-qa" },
      "tasks": [
        {
          "id": "q1",
          "input": {
            "question": "What was Q3 revenue?",
            "documents": "<4MB of financial reports>"
          }
        }
      ],
      "gold_outputs": [
        { "task_id": "q1", "output": { "answer": "$4.2B" } }
      ],
      "judge_config": { "mode": "rubric" }
    },
    "graph_type": "rlm",
    "policy_models": ["gpt-4o-mini"],
    "rollout_budget": 100
  }'

That’s it. The system:

Auto-detects documents is too large for prompts
Auto-adds RLM tools (materialize_context, local_grep, etc.)
Auto-configures the proposer to use tool-based search patterns

Auto-added tools

When graph_type: "rlm", these tools are automatically available:

Tool	Latency	Description
`materialize_context`	~1ms	Store input fields for searching
`local_grep`	~1ms	Regex search on materialized content
`local_search`	~1ms	Substring search
`query_lm`	~100ms	Sub-LM calls for processing chunks
`codex_exec`	~500ms	Shell execution (complex operations)

How it works

1. Materialize

# Generated graph stores context first
- name: store_docs
  tool: materialize_context
  args:
    field_name: documents
    filename: docs.txt

2. Search

# Fast local search (~1ms)
- name: find_revenue
  tool: local_grep
  args:
    pattern: "revenue|quarterly.*results"
    file: docs.txt
    max_matches: 20

3. Process

# LLM processes found snippets (not the full 4MB)
- name: answer
  model: gpt-4o-mini
  input:
    question: "{{input.question}}"
    relevant_sections: "{{find_revenue.matches}}"

Performance

Local RLM tools run ~11,000x faster than equivalent sandbox operations:

Operation	Sandbox	Local RLM
Write 45KB file	300ms	0.03ms
Grep file	400ms	0.1ms
Line count	350ms	0.05ms
Total	1050ms	0.2ms

Dataset format

Large context fields are auto-detected, but you can explicitly mark them:

{
  "metadata": { "name": "my-rlm-task" },
  "input_fields": [
    { "name": "question", "type": "text" },
    { "name": "documents", "type": "context" }
  ],
  "tasks": [...]
}

Fields larger than 4M characters (~1M tokens) are automatically treated as context fields.

Inference

Inference works the same as regular graphs:

curl -X POST $HOST/api/graphgen/graph/completions \
  -H "Authorization: Bearer $SYNTH_API_KEY" \
  -d '{
    "job_id": "graphgen_XXXX",
    "input": {
      "question": "What were operating expenses?",
      "documents": "<4MB of new documents>"
    }
  }'

The optimized graph handles materialization and searching automatically.

RLM as a Pattern

RLM is available as both a graph_type and as a pattern. This distinction matters:

graph_type: "rlm" - The graph’s primary purpose is RLM-style search
patterns.required: ["rlm"] - Apply RLM pattern to ANY graph type

Use patterns when you want RLM-style search in a verifier or specialized policy:

RLM Verifier

Train a verifier that uses tool-based search to analyze large traces:

curl -X POST $HOST/api/graphgen/jobs \
  -H "Authorization: Bearer $SYNTH_API_KEY" \
  -d '{
    "dataset": {...},
    "graph_type": "verifier",
    "patterns": {
      "required": ["rlm"]
    },
    "policy_models": ["gpt-4o-mini"],
    "rollout_budget": 100
  }'

This gives you a verifier that:

Auto-gets RLM tools (materialize_context, local_grep, etc.)
Uses tool-based search instead of stuffing traces into prompts
Outputs a score (0.0-1.0) like any verifier

Pattern Options

{
  "patterns": {
    "required": ["rlm"],           // MUST use RLM pattern
    "optional": ["map_reduce"],    // May also use map-reduce
    "prefer": []                   // Preferences
  }
}

Available patterns:

rlm - Tool-based search for massive context
map_reduce - Parallel processing for lists (common for verifiers)
single_shot - Single LLM call
chain_of_thought - Multi-step reasoning
digest_combine - Two-stage: digest then combine

Next steps

Judging: Learn how scoring works in product/workflows/judging
API Reference: See full API at sdk/graphs/inference

Get Started

Products

Infrastructure

Supported Models

Pricing

RLM: Massive Context Graphs

What is RLM?

When to use RLM

Quick start

Auto-added tools

How it works

1. Materialize

2. Search

3. Process

Performance

Dataset format

Inference

RLM as a Pattern

RLM Verifier

Pattern Options

Next steps

Get Started

Products

Infrastructure

Supported Models

Pricing

​What is RLM?

​When to use RLM

​Quick start

​Auto-added tools

​How it works

​1. Materialize

​2. Search

​3. Process

​Performance

​Dataset format

​Inference

​RLM as a Pattern

​RLM Verifier

​Pattern Options

​Next steps

What is RLM?

When to use RLM

Quick start

Auto-added tools

How it works

1. Materialize

2. Search

3. Process

Performance

Dataset format

Inference

RLM as a Pattern

RLM Verifier

Pattern Options

Next steps