Skip to content

ReasonScape

ReasonScape is a research platform for investigating how reasoning-tuned language models process information.

ReasonScape 4-image Collage

ReasonScape reveals cognitive architecture patterns invisible to traditional benchmarks: 3D reasoning landscapes (left), token-frequency spectral analysis (bottom right), and interactive exploration tools (top and middle right) enable systematic comparison of information processing capabilities across models and tasks.

🌐 Homepage: https://reasonscape.com/

🛠️ GitHub: the-crypt-keeper/reasonscape

Keywords: Large language models, AI evaluation, cognitive architectures, spectral analysis, statistical methodology, parametric testing, difficulty manifolds, information processing

Live Tools & Data

📊 Visualization Tools:

📁 Raw Data:

ReasonScape V3

The ReasonScape documentation is organized into six chapters:

Chapter What It Is Where to Learn More
1 Challenges Practical problems encountered in prior LLM evaluation systems challenges.md
2 Insight LLMs are not simply text generators, they're information processors insight.md
3 Methodology Systematic solutions that emerge from applying the Insight to the Challenges architecture.md
4 Implementation The Python codebase that makes it real implementation.md
5 Reference Evaluation r12 as the current reference evaluation r12.md
6 Research Datasets Worked examples showing the methodology in action datasets.md

Where to Start

Understand the Vision

Challenges - The problem statement (chapter 1)

  • Fundamental challenges in current LLM evaluation
  • Practical barriers encountered in prior systems

Insight - The paradigm shift (chapter 2)

  • LLMs as information processors
  • System architecture and transformation pipeline

Architecture - The methodology (chapter 3)

  • Applies the Insight to the Challeneges
  • Five-stage data processing pipeline
  • Discovery-investigation research loop

Use the Datasets

r12 - The current reference evaluation (chapter 5)

  • 12 reasoning tasks, 16k context
  • ReasonScore v2 with bootstrap confidence intervals

Datasets - Research collections (chapter 6)

  • Worked examples of complete investigations
  • Templates for your own research

Workflow Guide - The Three P's

  • Position: Ranking models ("Which is better?")
  • Profile: Characterizing and diagnosing capabilities ("What can it do? Why does this fail?")
  • Probe: Examining raw traces ("What does failure look like?")

Dive Deeper

Implementation - The Python codebase (chapter 4)

  • Stage-by-stage implementation guide
  • Deep-dive design documents (manifold, reasonscore)
  • Tool references and workflows

Technical Details - Low-level algorithms

  • Parametric test generation
  • Statistical methodology
  • FFT analysis and compression

Tools - Complete command reference

  • All flags and formats for every CLI tool and webapp
  • Filter syntax, output formats, probe loop classifications

Config - Experiment configuration

  • Precision levels and sampling strategies
  • Tasks (list, grid, manifold modes)

Templates & Samplers - Execution configuration

  • Prompting strategies (templates)
  • Generation parameters (samplers)

PointsDB - Data structure API

Tasks - Abstract task API and task overview


Citation

If you use ReasonScape in your research, please cite:

@software{reasonscape2025,
  title={ReasonScape: Information Processing Evaluation for Large Language Models},
  author={Mikhail Ravkine},
  year={2025},
  url={https://github.com/the-crypt-keeper/reasonscape}
}

License

MIT

Acknowledgments

ReasonScape builds upon insights from BIG-Bench Hard, lm-evaluation-harness, and the broader AI evaluation community.