Skip to content

The Core Insight: LLMs as Information Processors

Modern LLMs are not simply text generators—they're information processing systems.

Once you recognize this, you inherit decades of applicable tools: - Signal processing (FFT) - 60 years old - Survival analysis (hazard) - from the 1950s - Information theory (entropy) - Shannon 1948

The techniques aren't new. Applying them to LLM evaluation is.

LLMs as Information Transformation Pipelines

ReasonScape evaluates complete LLM systems, not just model weights. The information flow involves multiple transformation stages:

graph LR
    A[Task Definitions] --> C{Test Generator}
    B[Difficulty Parameters] --> C
    C -->|Test:text| D{Template}
    C -->|Target:text| K
    D -->|Test:prompt| E[[LLM Tokenizer]]
    E -->|Input:tokens| F((LLM Inference))
    F -->|Output:logprobs| G{Sampling}
    G -->|Output:tokens| H[[LLM Tokenizer]]
    H -->|Output:text| I{Parse}
    I -->|Thought:text| J[Analysis]
    I -->|Answer:text| K{Compare}
    K -->|Success:bool| J

    style A fill:#e1f5fe
    style B fill:#e1f5fe
    style J fill:#f3e5f5
    style K fill:#fff3e0
    style D fill:#fff3e0
    style I fill:#fff3e0
    style G fill:#fff3e0
    style C fill:#e8f5e8

Information Transformation Pipeline

  1. Task Generator → Creates parametric test instances (text)
  2. Template Engine → Applies model-specific chat formatting (text → text)
  3. Tokenizer → Converts to token sequences (text → tokens)
  4. LLM Inference → Processes through learned parameters (tokens → logprobs)
  5. Sampler → Collapses distributions to concrete outputs (logprobs → tokens)
  6. Detokenizer → Converts back to text (tokens → text)
  7. Parser → Extracts reasoning and answers (text → structured data)
  8. Evaluator → Compares with targets (structured data → statistics)

LLM Sub-System Components

ReasonScape measures the entire system, which includes:

  • Chat Templates: Model-specific prompt formatting (affects input structure)
  • Tokenizers: Text↔token conversion (creates measurable frequency signatures)
  • Inference Engines: Model parameters, architecture, quantization
  • Sampling Strategies: Temperature, top-p, top-k, min-p (controls output distribution collapse)

Implication: Performance differences may arise from any of these components, not just model quality. Forensic analysis (FFT, compression, surface, hazard) helps isolate failure sources.

Why This Matters

Viewing LLMs as information processors enables:

  1. Spectral Analysis (FFT): Understand how tokenization affects problem representation
  2. Compression Analysis: Measure information quality in reasoning traces
  3. Hazard Analysis: Track when and how reasoning degrades over time
  4. Surface Analysis: Map capability boundaries in parameter space

These tools reveal patterns invisible to text-only evaluation:

  • Tokenizer efficiency differences
  • Working memory limits
  • Reasoning/Meta-reasoning loop detection
  • Catastrophic failure boundaries

Evaluation becomes information flow analysis. Failures become signal processing problems. Performance becomes measurable in information-theoretic terms.


See Also