Skip to main content

What is RLM?

RLM (Recursive Language Model) graphs handle massive context (1M+ tokens) that’s too large to fit in prompts. Instead of interpolating huge documents directly into LLM calls, RLM graphs:
  1. Materialize context to a searchable store
  2. Search via fast local tools (~1ms grep/search)
  3. Extract relevant snippets for LLM processing
  4. Synthesize the final answer from found information
This pattern enables working with entire codebases, document corpora, or datasets that would exceed any model’s context window.

When to use RLM

ScenarioUse RLM?
Context < 100K tokensNo - use graph_type: "policy"
Context 100K-500K tokensMaybe - depends on model limits
Context > 500K tokensYes - use graph_type: "rlm"
RAG over large corpusYes
Codebase analysisYes
Multi-document QAYes

Quick start

curl -X POST $HOST/api/graphgen/jobs \
  -H "Authorization: Bearer $SYNTH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "dataset": {
      "metadata": { "name": "financial-qa" },
      "tasks": [
        {
          "id": "q1",
          "input": {
            "question": "What was Q3 revenue?",
            "documents": "<4MB of financial reports>"
          }
        }
      ],
      "gold_outputs": [
        { "task_id": "q1", "output": { "answer": "$4.2B" } }
      ],
      "judge_config": { "mode": "rubric" }
    },
    "graph_type": "rlm",
    "policy_models": ["gpt-4o-mini"],
    "rollout_budget": 100
  }'
That’s it. The system:
  • Auto-detects documents is too large for prompts
  • Auto-adds RLM tools (materialize_context, local_grep, etc.)
  • Auto-configures the proposer to use tool-based search patterns

Auto-added tools

When graph_type: "rlm", these tools are automatically available:
ToolLatencyDescription
materialize_context~1msStore input fields for searching
local_grep~1msRegex search on materialized content
local_search~1msSubstring search
query_lm~100msSub-LM calls for processing chunks
codex_exec~500msShell execution (complex operations)

How it works

1. Materialize

# Generated graph stores context first
- name: store_docs
  tool: materialize_context
  args:
    field_name: documents
    filename: docs.txt
# Fast local search (~1ms)
- name: find_revenue
  tool: local_grep
  args:
    pattern: "revenue|quarterly.*results"
    file: docs.txt
    max_matches: 20

3. Process

# LLM processes found snippets (not the full 4MB)
- name: answer
  model: gpt-4o-mini
  input:
    question: "{{input.question}}"
    relevant_sections: "{{find_revenue.matches}}"

Performance

Local RLM tools run ~11,000x faster than equivalent sandbox operations:
OperationSandboxLocal RLM
Write 45KB file300ms0.03ms
Grep file400ms0.1ms
Line count350ms0.05ms
Total1050ms0.2ms

Dataset format

Large context fields are auto-detected, but you can explicitly mark them:
{
  "metadata": { "name": "my-rlm-task" },
  "input_fields": [
    { "name": "question", "type": "text" },
    { "name": "documents", "type": "context" }
  ],
  "tasks": [...]
}
Fields larger than 4M characters (~1M tokens) are automatically treated as context fields.

Inference

Inference works the same as regular graphs:
curl -X POST $HOST/api/graphgen/graph/completions \
  -H "Authorization: Bearer $SYNTH_API_KEY" \
  -d '{
    "job_id": "graphgen_XXXX",
    "input": {
      "question": "What were operating expenses?",
      "documents": "<4MB of new documents>"
    }
  }'
The optimized graph handles materialization and searching automatically.

RLM as a Pattern

RLM is available as both a graph_type and as a pattern. This distinction matters:
  • graph_type: "rlm" - The graph’s primary purpose is RLM-style search
  • patterns.required: ["rlm"] - Apply RLM pattern to ANY graph type
Use patterns when you want RLM-style search in a verifier or specialized policy:

RLM Verifier

Train a verifier that uses tool-based search to analyze large traces:
curl -X POST $HOST/api/graphgen/jobs \
  -H "Authorization: Bearer $SYNTH_API_KEY" \
  -d '{
    "dataset": {...},
    "graph_type": "verifier",
    "patterns": {
      "required": ["rlm"]
    },
    "policy_models": ["gpt-4o-mini"],
    "rollout_budget": 100
  }'
This gives you a verifier that:
  • Auto-gets RLM tools (materialize_context, local_grep, etc.)
  • Uses tool-based search instead of stuffing traces into prompts
  • Outputs a score (0.0-1.0) like any verifier

Pattern Options

{
  "patterns": {
    "required": ["rlm"],           // MUST use RLM pattern
    "optional": ["map_reduce"],    // May also use map-reduce
    "prefer": []                   // Preferences
  }
}
Available patterns:
  • rlm - Tool-based search for massive context
  • map_reduce - Parallel processing for lists (common for verifiers)
  • single_shot - Single LLM call
  • chain_of_thought - Multi-step reasoning
  • digest_combine - Two-stage: digest then combine

Next steps

  • Judging: Learn how scoring works in product/workflows/judging
  • API Reference: See full API at sdk/graphs/inference