ReasonScape Evaluation Suites¶
ReasonScape provides comprehensive evaluation methodology for systematic assessment of large language model reasoning capabilities across multiple cognitive domains.
→ M12X Experiment Documentation (Current - Recommended)
Evaluation Recommendation¶
All new research should use M12X for comprehensive 12-domain reasoning evaluation with flexible difficulty and precision controls.
- Comprehensive Coverage: 12 reasoning domains vs 6 (prior M6 suite)
- Flexible Configuration: Three independent parameters (--degree, --density, --precision) for complete control
- Advanced Methodology: Improved statistical rigor and manifold sampling
- Efficient Scaling: 2-3 hours (rapid) to 20+ hours (research-grade)
Current Status Summary¶
| Suite | Tasks | Token Dataset | Status | Recommendation |
|---|---|---|---|---|
| M12X | 12 domains | 3.5B tokens | Active Development | Use for all new research |
Live Results Access¶
M12X Results (Current)¶
- M12X Leaderboard: reasonscape.com/m12x/leaderboard
- M12X Explorer: reasonscape.com/m12x/explorer (PC required)
- M12X Dataset: reasonscape.com/data/m12x