Skip to content

ReasonScape Evaluation Suites

ReasonScape provides comprehensive evaluation methodology for systematic assessment of large language model reasoning capabilities across multiple cognitive domains.

M12X Experiment Documentation (Current - Recommended)

Evaluation Recommendation

All new research should use M12X for comprehensive 12-domain reasoning evaluation with flexible difficulty and precision controls.

  • Comprehensive Coverage: 12 reasoning domains vs 6 (prior M6 suite)
  • Flexible Configuration: Three independent parameters (--degree, --density, --precision) for complete control
  • Advanced Methodology: Improved statistical rigor and manifold sampling
  • Efficient Scaling: 2-3 hours (rapid) to 20+ hours (research-grade)

Current Status Summary

Suite Tasks Token Dataset Status Recommendation
M12X 12 domains 3.5B tokens Active Development Use for all new research

Live Results Access

M12X Results (Current)