The Core Insight: LLMs as Information Processors¶
ReasonScape is built on the idea that the evaluation regime and the LLM form a system and that it is this combined system we observe. These systems are far more then merely text-in and text-out, applying this insight allows us to map capabilities and understand failure mode.
graph LR
A[Task Definitions] --> C{Test Generator}
B[Difficulty Parameters] --> C
C -->|Test:text| D{Template}
C -->|Target:text| K
D -->|Test:prompt| E[[LLM Tokenizer]]
E -->|Input:tokens| F((LLM Inference))
F -->|Output:logprobs| G{Sampling}
G -->|Output:tokens| H[[LLM Tokenizer]]
H -->|Output:text| I{Parse}
I -->|Thought:text| J[Analysis]
I -->|Answer:text| K{Compare}
K -->|Success:bool| J
style A fill:#e1f5fe
style B fill:#e1f5fe
style J fill:#f3e5f5
style K fill:#fff3e0
style D fill:#fff3e0
style I fill:#fff3e0
style G fill:#fff3e0
style C fill:#e8f5e8
- Task Generator → Creates parametric test instances (text)
- Template Engine → Applies model-specific chat formatting (text → text)
- Tokenizer → Converts to token sequences (text → tokens)
- LLM Inference → Processes through learned parameters (tokens → logprobs)
- Sampler → Collapses distributions to concrete outputs (logprobs → tokens)
- Detokenizer → Converts back to text (tokens → text)
- Parser → Extracts reasoning and answers (text → structured data)
- Evaluator → Compares with targets (structured data → statistics)
We decompose this system with a flexible implementation that provides full control over the entire pipeline from input generation to final evaluation, and offers forensic analysis tools to help understand and isolate failure sources:
- Spectral Analysis (FFT): Understand how tokenization affects problem representation
- Compression Analysis: Measure information quality in reasoning traces
- Hazard Analysis: Track when and how reasoning degrades over time
- Surface Analysis: Map capability boundaries in parameter space
- Capacity Analysis: Map context-limit sensitivity and attention decay breakdown.
See Also¶
- challenges.md - The practical problems this insight helps solve
- architecture.md - How this insight informs the five-stage methodology
- technical-details.md - Technical implementation details