The Core Insight: LLMs as Information Processors¶
Modern LLMs are not simply text generators—they're information processing systems.
Once you recognize this, you inherit decades of applicable tools: - Signal processing (FFT) - 60 years old - Survival analysis (hazard) - from the 1950s - Information theory (entropy) - Shannon 1948
The techniques aren't new. Applying them to LLM evaluation is.
LLMs as Information Transformation Pipelines¶
ReasonScape evaluates complete LLM systems, not just model weights. The information flow involves multiple transformation stages:
graph LR
A[Task Definitions] --> C{Test Generator}
B[Difficulty Parameters] --> C
C -->|Test:text| D{Template}
C -->|Target:text| K
D -->|Test:prompt| E[[LLM Tokenizer]]
E -->|Input:tokens| F((LLM Inference))
F -->|Output:logprobs| G{Sampling}
G -->|Output:tokens| H[[LLM Tokenizer]]
H -->|Output:text| I{Parse}
I -->|Thought:text| J[Analysis]
I -->|Answer:text| K{Compare}
K -->|Success:bool| J
style A fill:#e1f5fe
style B fill:#e1f5fe
style J fill:#f3e5f5
style K fill:#fff3e0
style D fill:#fff3e0
style I fill:#fff3e0
style G fill:#fff3e0
style C fill:#e8f5e8
Information Transformation Pipeline¶
- Task Generator → Creates parametric test instances (text)
- Template Engine → Applies model-specific chat formatting (text → text)
- Tokenizer → Converts to token sequences (text → tokens)
- LLM Inference → Processes through learned parameters (tokens → logprobs)
- Sampler → Collapses distributions to concrete outputs (logprobs → tokens)
- Detokenizer → Converts back to text (tokens → text)
- Parser → Extracts reasoning and answers (text → structured data)
- Evaluator → Compares with targets (structured data → statistics)
LLM Sub-System Components¶
ReasonScape measures the entire system, which includes:
- Chat Templates: Model-specific prompt formatting (affects input structure)
- Tokenizers: Text↔token conversion (creates measurable frequency signatures)
- Inference Engines: Model parameters, architecture, quantization
- Sampling Strategies: Temperature, top-p, top-k, min-p (controls output distribution collapse)
Implication: Performance differences may arise from any of these components, not just model quality. Forensic analysis (FFT, compression, surface, hazard) helps isolate failure sources.
Why This Matters¶
Viewing LLMs as information processors enables:
- Spectral Analysis (FFT): Understand how tokenization affects problem representation
- Compression Analysis: Measure information quality in reasoning traces
- Hazard Analysis: Track when and how reasoning degrades over time
- Surface Analysis: Map capability boundaries in parameter space
These tools reveal patterns invisible to text-only evaluation:
- Tokenizer efficiency differences
- Working memory limits
- Reasoning/Meta-reasoning loop detection
- Catastrophic failure boundaries
Evaluation becomes information flow analysis. Failures become signal processing problems. Performance becomes measurable in information-theoretic terms.
See Also¶
- challenges.md - The practical problems this insight helps solve
- architecture.md - How this insight informs the five-stage methodology
- technical-details.md - Technical implementation details