ReasonScape¶

ReasonScape is a research platform for investigating how reasoning-tuned language models process information.

ReasonScape 4-image Collage

ReasonScape reveals cognitive architecture patterns invisible to traditional benchmarks: 3D reasoning landscapes (left), token-frequency spectral analysis (bottom right), and interactive exploration tools (top and middle right) enable systematic comparison of information processing capabilities across models and tasks.

🌐 Homepage: https://reasonscape.com/

🛠️ GitHub: the-crypt-keeper/reasonscape

Keywords: Large language models, AI evaluation, cognitive architectures, spectral analysis, statistical methodology, parametric testing, difficulty manifolds, information processing

Live Tools & Data¶

📊 Visualization Tools:

r12 Leaderboard: https://reasonscape.com/r12/leaderboard
r12 Explorer: https://reasonscape.com/r12/explorer (PC required)

📁 Raw Data:

r12 Dataset: Download Guide | Direct Access (100+ models, 9B+ tokens)

ReasonScape V3¶

The ReasonScape documentation is organized into six chapters:

	Chapter	What It Is	Where to Learn More
1	Challenges	Practical problems encountered in prior LLM evaluation systems	challenges.md
2	Insight	LLMs are not simply text generators, they're information processors	insight.md
3	Methodology	Systematic solutions that emerge from applying the Insight to the Challenges	architecture.md
4	Implementation	The Python codebase that makes it real	implementation.md
5	Reference Evaluation	r12 as the current reference evaluation	r12.md
6	Research Datasets	Worked examples showing the methodology in action	datasets.md

Where to Start¶

Understand the Vision¶

Challenges - The problem statement (chapter 1)

Fundamental challenges in current LLM evaluation
Practical barriers encountered in prior systems

Insight - The paradigm shift (chapter 2)

LLMs as information processors
System architecture and transformation pipeline

Architecture - The methodology (chapter 3)

Applies the Insight to the Challeneges
Five-stage data processing pipeline
Discovery-investigation research loop

Use the Datasets¶

r12 - The current reference evaluation (chapter 5)

12 reasoning tasks, 16k context
ReasonScore v2 with bootstrap confidence intervals

Datasets - Research collections (chapter 6)

Worked examples of complete investigations
Templates for your own research

Workflow Guide - The Three P's

Position: Ranking models ("Which is better?")
Profile: Characterizing and diagnosing capabilities ("What can it do? Why does this fail?")
Probe: Examining raw traces ("What does failure look like?")

Dive Deeper¶

Implementation - The Python codebase (chapter 4)

Stage-by-stage implementation guide
Deep-dive design documents (manifold, reasonscore)
Tool references and workflows

Technical Details - Low-level algorithms

Parametric test generation
Statistical methodology
FFT analysis and compression

Tools - Complete command reference

All flags and formats for every CLI tool and webapp
Filter syntax, output formats, probe loop classifications

Config - Experiment configuration

Precision levels and sampling strategies
Tasks (list, grid, manifold modes)

Templates & Samplers - Execution configuration

Prompting strategies (templates)
Generation parameters (samplers)

PointsDB - Data structure API

Tasks - Abstract task API and task overview

Citation¶

If you use ReasonScape in your research, please cite:

@software{reasonscape2025,
  title={ReasonScape: Information Processing Evaluation for Large Language Models},
  author={Mikhail Ravkine},
  year={2025},
  url={https://github.com/the-crypt-keeper/reasonscape}
}

License¶

MIT

Acknowledgments¶

ReasonScape builds upon insights from BIG-Bench Hard, lm-evaluation-harness, and the broader AI evaluation community.