The Core Insight: LLMs as Information Processors¶

Modern LLMs are not simply text generators—they're information processing systems.

Once you recognize this, you inherit decades of applicable tools: - Signal processing (FFT) - 60 years old - Survival analysis (hazard) - from the 1950s - Information theory (entropy) - Shannon 1948

The techniques aren't new. Applying them to LLM evaluation is.

LLMs as Information Transformation Pipelines¶

ReasonScape evaluates complete LLM systems, not just model weights. The information flow involves multiple transformation stages:

graph LR
    A[Task Definitions] --> C{Test Generator}
    B[Difficulty Parameters] --> C
    C -->|Test:text| D{Template}
    C -->|Target:text| K
    D -->|Test:prompt| E[[LLM Tokenizer]]
    E -->|Input:tokens| F((LLM Inference))
    F -->|Output:logprobs| G{Sampling}
    G -->|Output:tokens| H[[LLM Tokenizer]]
    H -->|Output:text| I{Parse}
    I -->|Thought:text| J[Analysis]
    I -->|Answer:text| K{Compare}
    K -->|Success:bool| J

    style A fill:#e1f5fe
    style B fill:#e1f5fe
    style J fill:#f3e5f5
    style K fill:#fff3e0
    style D fill:#fff3e0
    style I fill:#fff3e0
    style G fill:#fff3e0
    style C fill:#e8f5e8

Information Transformation Pipeline¶

Task Generator → Creates parametric test instances (text)
Template Engine → Applies model-specific chat formatting (text → text)
Tokenizer → Converts to token sequences (text → tokens)
LLM Inference → Processes through learned parameters (tokens → logprobs)
Sampler → Collapses distributions to concrete outputs (logprobs → tokens)
Detokenizer → Converts back to text (tokens → text)
Parser → Extracts reasoning and answers (text → structured data)
Evaluator → Compares with targets (structured data → statistics)

LLM Sub-System Components¶

ReasonScape measures the entire system, which includes:

Chat Templates: Model-specific prompt formatting (affects input structure)
Tokenizers: Text↔token conversion (creates measurable frequency signatures)
Inference Engines: Model parameters, architecture, quantization
Sampling Strategies: Temperature, top-p, top-k, min-p (controls output distribution collapse)

Implication: Performance differences may arise from any of these components, not just model quality. Forensic analysis (FFT, compression, surface, hazard) helps isolate failure sources.

Why This Matters¶

Viewing LLMs as information processors enables:

Spectral Analysis (FFT): Understand how tokenization affects problem representation
Compression Analysis: Measure information quality in reasoning traces
Hazard Analysis: Track when and how reasoning degrades over time
Surface Analysis: Map capability boundaries in parameter space

These tools reveal patterns invisible to text-only evaluation:

Tokenizer efficiency differences
Working memory limits
Reasoning/Meta-reasoning loop detection
Catastrophic failure boundaries

Evaluation becomes information flow analysis. Failures become signal processing problems. Performance becomes measurable in information-theoretic terms.

The Core Insight: LLMs as Information Processors¶

LLMs as Information Transformation Pipelines¶

Information Transformation Pipeline¶

LLM Sub-System Components¶

Why This Matters¶

See Also¶