M12X: Comprehensive 12-Domain Reasoning Evaluation¶
The M12X evaluation suite represents the most comprehensive assessment of large language model reasoning capabilities, featuring 12 cognitive domains with progressive difficulty scaling and flexible resource utilization.
Overview¶
M12X (Multi-domain 12-task eXtended) is ReasonScape's flagship evaluation methodology, designed to provide thorough assessment across diverse reasoning capabilities while maintaining statistical rigor and computational efficiency.
Key Features¶
- 12 Cognitive Domains: Comprehensive coverage of reasoning capabilities
- Progressive Difficulty: 3-degree scaling from easy to hard
- Flexible Precision: Independent resource utilization control
- Statistical Rigor: Confidence intervals, excess accuracy correction
- Hierarchical Sampling: Perfect subset scaling for efficient evaluation
See Methodology for more details.
Cognitive Domains¶
M12X evaluates across twelve distinct reasoning domains that provide a comprehensive assessment of:
- Mathematical and logical reasoning
- Complex instruction following
- Spatial and temporal processing
- Pattern recognition and prediction
- Structural parsing and syntax
- Planning and algorithmic thinking
See Tasks for additional details.
Resource Usage¶
| Model | Total Tokens | Avg Tokens/Completion | Total Tests | Arithmetic Tests | Boolean Tests | Brackets Tests | Cars Tests | Dates Tests | Letters Tests | Movies Tests | Objects Tests | Sequence Tests | Shapes Tests | Shuffle Tests | Sort Tests |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GPT-OSS-120B (MX4) (easy) | 8,074,603 | 762 | 11,451 | 1,088 | 574 | 831 | 2,560 | 543 | 544 | 448 | 1,216 | 544 | 1,439 | 1,056 | 608 |
| GPT-OSS-120B (MX4) (medium) | 20,362,466 | 1053 | 19,175 | 1,792 | 2,323 | 1,625 | 2,912 | 830 | 736 | 1,408 | 1,632 | 864 | 1,597 | 2,304 | 1,152 |
| GPT-OSS-120B (MX4) (hard) | 28,578,417 | 1324 | 21,613 | 2,015 | 2,131 | 1,925 | 2,816 | 1,181 | 864 | 1,917 | 2,240 | 1,024 | 1,629 | 2,528 | 1,343 |
| Qwen3-Next-80B-A3B Instruct (AWQ) (easy) | 15,880,500 | 1232 | 12,098 | 1,630 | 607 | 916 | 2,589 | 511 | 703 | 384 | 1,184 | 858 | 1,184 | 608 | 924 |
| Qwen3-Next-80B-A3B Instruct (AWQ) (medium) | 38,713,942 | 1612 | 21,696 | 3,054 | 2,846 | 2,244 | 2,918 | 798 | 893 | 1,408 | 1,759 | 1,076 | 1,216 | 2,207 | 1,277 |
| Qwen3-Next-80B-A3B Instruct (AWQ) (hard) | 52,309,582 | 1937 | 24,870 | 3,521 | 2,736 | 2,657 | 2,948 | 1,275 | 1,081 | 1,888 | 2,451 | 1,143 | 1,248 | 2,586 | 1,336 |
| Qwen3-32B (AWQ) (easy) | 23,563,982 | 1678 | 12,707 | 1,481 | 652 | 1,485 | 2,094 | 511 | 957 | 512 | 1,536 | 572 | 1,312 | 768 | 827 |
| Qwen3-32B (AWQ) (medium) | 57,049,799 | 2173 | 22,494 | 2,729 | 2,892 | 2,262 | 2,509 | 828 | 1,420 | 1,536 | 2,335 | 814 | 1,344 | 2,619 | 1,206 |
| Qwen3-32B (AWQ) (hard) | 70,129,352 | 2544 | 23,973 | 2,828 | 2,677 | 1,753 | 2,821 | 1,242 | 1,501 | 2,048 | 2,608 | 820 | 1,344 | 3,328 | 1,003 |
| Seed-OSS 36B (AWQ) (easy) | 25,350,999 | 2510 | 9,150 | 1,138 | 653 | 601 | 1,523 | 508 | 589 | 416 | 1,119 | 233 | 1,324 | 598 | 448 |
| Seed-OSS 36B (AWQ) (medium) | 51,840,440 | 3305 | 13,774 | 2,107 | 2,030 | 700 | 1,814 | 793 | 463 | 1,277 | 1,331 | 134 | 1,321 | 1,084 | 720 |
| Seed-OSS 36B (AWQ) (hard) | 61,035,328 | 3805 | 14,017 | 2,275 | 1,841 | 320 | 1,801 | 1,235 | 228 | 1,721 | 1,714 | 113 | 1,303 | 710 | 756 |
| GPT-OSS-20B (MX4) (easy) | 16,131,603 | 1168 | 13,647 | 1,256 | 633 | 978 | 2,601 | 509 | 1,023 | 444 | 1,919 | 574 | 1,630 | 1,152 | 928 |
| GPT-OSS-20B (MX4) (medium) | 38,321,588 | 1593 | 21,597 | 2,303 | 2,257 | 1,454 | 2,757 | 796 | 1,436 | 1,462 | 2,518 | 937 | 1,661 | 2,586 | 1,430 |
| GPT-OSS-20B (MX4) (hard) | 51,976,561 | 2069 | 23,217 | 2,455 | 2,089 | 1,327 | 2,705 | 1,147 | 1,502 | 1,932 | 2,597 | 1,067 | 1,787 | 3,360 | 1,249 |
| Qwen3-14B (AWQ) (easy) | 27,150,343 | 1869 | 12,993 | 1,683 | 981 | 1,079 | 2,232 | 540 | 860 | 480 | 1,696 | 531 | 1,216 | 800 | 895 |
| Qwen3-14B (AWQ) (medium) | 60,443,268 | 2362 | 22,226 | 3,356 | 2,963 | 1,488 | 2,605 | 827 | 1,082 | 1,472 | 2,589 | 793 | 1,248 | 2,526 | 1,277 |
| Qwen3-14B (AWQ) (hard) | 72,235,743 | 2758 | 23,448 | 3,747 | 3,002 | 827 | 2,902 | 1,273 | 1,409 | 2,016 | 2,537 | 855 | 1,280 | 2,566 | 1,034 |
| Ring Flash 2.0 (AWQ) (easy) | 36,859,607 | 3183 | 9,917 | 1,212 | 438 | 514 | 2,124 | 473 | 748 | 416 | 1,275 | 309 | 980 | 766 | 662 |
| Ring Flash 2.0 (AWQ) (medium) | 71,545,143 | 3836 | 15,516 | 1,789 | 1,845 | 570 | 2,279 | 726 | 938 | 1,375 | 1,542 | 401 | 951 | 2,174 | 926 |
| Ring Flash 2.0 (AWQ) (hard) | 77,024,066 | 4229 | 14,924 | 1,470 | 1,699 | 365 | 2,180 | 1,105 | 584 | 1,854 | 1,741 | 394 | 839 | 1,929 | 764 |
| Qwen3-30B-A3B Original (AWQ) (easy) | 32,880,717 | 2283 | 13,303 | 1,448 | 894 | 1,214 | 2,416 | 479 | 1,216 | 512 | 1,600 | 558 | 1,216 | 861 | 889 |
| Qwen3-30B-A3B Original (AWQ) (medium) | 71,858,396 | 2870 | 22,115 | 2,650 | 3,155 | 1,805 | 2,809 | 766 | 1,482 | 1,536 | 2,358 | 608 | 1,311 | 2,480 | 1,155 |
| Qwen3-30B-A3B Original (AWQ) (hard) | 81,544,242 | 3251 | 22,219 | 2,508 | 3,013 | 1,306 | 2,755 | 1,148 | 1,136 | 2,048 | 2,615 | 598 | 1,375 | 2,838 | 879 |
| Phi-4 Reasoning (FP16) (easy) | 32,481,996 | 2057 | 12,834 | 1,923 | 566 | 1,439 | 2,094 | 491 | 1,468 | 438 | 1,299 | 461 | 1,097 | 790 | 768 |
| Phi-4 Reasoning (FP16) (medium) | 60,383,587 | 2600 | 19,619 | 3,505 | 2,086 | 1,405 | 1,887 | 800 | 1,492 | 1,414 | 1,725 | 671 | 1,118 | 2,321 | 1,195 |
| Phi-4 Reasoning (FP16) (hard) | 70,094,680 | 3062 | 19,748 | 3,580 | 2,030 | 549 | 1,571 | 1,201 | 1,354 | 1,890 | 2,327 | 715 | 966 | 2,477 | 1,088 |
| Qwen3-30B-A3B DeepSeek v3.1 Distill (FP8) (easy) | 36,762,408 | 3102 | 10,274 | 1,127 | 551 | 953 | 1,806 | 445 | 556 | 480 | 1,312 | 561 | 955 | 795 | 733 |
| Qwen3-30B-A3B DeepSeek v3.1 Distill (FP8) (medium) | 68,031,485 | 3803 | 15,498 | 2,074 | 2,468 | 576 | 1,657 | 699 | 458 | 1,376 | 1,457 | 573 | 955 | 1,967 | 1,238 |
| Qwen3-30B-A3B DeepSeek v3.1 Distill (FP8) (hard) | 71,725,328 | 4208 | 14,888 | 2,265 | 2,178 | 206 | 1,330 | 1,078 | 202 | 1,855 | 1,454 | 554 | 859 | 1,739 | 1,168 |
| Apriel-1.5-15b-Thinker (FP16) (easy) | 25,069,379 | 2100 | 9,969 | 685 | 233 | 644 | 2,160 | 569 | 813 | 444 | 1,168 | 533 | 1,092 | 817 | 811 |
| Apriel-1.5-15b-Thinker (FP16) (medium) | 50,003,151 | 2821 | 13,049 | 630 | 1,023 | 584 | 2,293 | 823 | 720 | 1,327 | 1,228 | 371 | 1,142 | 1,934 | 974 |
| Apriel-1.5-15b-Thinker (FP16) (hard) | 55,723,294 | 3374 | 11,467 | 338 | 992 | 273 | 2,177 | 1,232 | 359 | 1,756 | 970 | 228 | 1,140 | 1,382 | 620 |
| QwQ 32B (AWQ) (easy) | 36,511,724 | 2886 | 11,449 | 1,125 | 362 | 881 | 2,321 | 477 | 820 | 603 | 1,656 | 470 | 1,075 | 895 | 764 |
| QwQ 32B (AWQ) (medium) | 73,382,414 | 3672 | 17,020 | 1,546 | 1,390 | 643 | 2,128 | 732 | 995 | 1,651 | 2,346 | 505 | 1,072 | 2,926 | 1,086 |
| QwQ 32B (AWQ) (hard) | 77,741,526 | 4107 | 16,034 | 1,211 | 1,361 | 322 | 2,108 | 1,047 | 604 | 2,224 | 2,544 | 654 | 956 | 2,157 | 846 |
| Apriel-Nemotron-1.5-15b-Thinker (FP16) (easy) | 25,678,966 | 1494 | 14,915 | 2,410 | 871 | 1,980 | 2,238 | 542 | 949 | 448 | 1,753 | 764 | 1,233 | 959 | 768 |
| Apriel-Nemotron-1.5-15b-Thinker (FP16) (medium) | 54,062,782 | 1950 | 24,046 | 4,166 | 2,338 | 2,578 | 2,715 | 829 | 1,401 | 1,440 | 2,571 | 891 | 1,295 | 2,897 | 925 |
| Apriel-Nemotron-1.5-15b-Thinker (FP16) (hard) | 64,123,746 | 2255 | 24,958 | 4,455 | 2,443 | 1,926 | 2,932 | 1,274 | 1,475 | 1,919 | 2,488 | 956 | 1,295 | 3,082 | 713 |
| aquif-3.5 8B (FP16) (easy) | 24,717,514 | 1395 | 15,476 | 2,569 | 854 | 1,544 | 2,612 | 575 | 1,191 | 512 | 1,888 | 670 | 1,365 | 896 | 800 |
| aquif-3.5 8B (FP16) (medium) | 50,795,435 | 1775 | 25,363 | 4,480 | 2,972 | 2,133 | 2,912 | 894 | 1,480 | 1,408 | 2,613 | 923 | 1,427 | 3,226 | 895 |
| aquif-3.5 8B (FP16) (hard) | 59,288,272 | 2100 | 25,260 | 4,553 | 2,845 | 1,343 | 2,980 | 1,371 | 1,378 | 1,920 | 2,345 | 957 | 1,432 | 3,404 | 732 |
| Qwen3-Next-80B-A3B Thinking (AWQ) (easy) | 31,840,100 | 3453 | 7,954 | 918 | 318 | 476 | 1,604 | 413 | 541 | 480 | 987 | 152 | 949 | 701 | 415 |
| Qwen3-Next-80B-A3B Thinking (AWQ) (medium) | 59,204,594 | 4197 | 11,779 | 1,360 | 1,337 | 485 | 1,443 | 699 | 681 | 1,466 | 1,135 | 69 | 1,004 | 1,472 | 628 |
| Qwen3-Next-80B-A3B Thinking (AWQ) (hard) | 66,296,992 | 4698 | 11,285 | 1,099 | 1,245 | 274 | 1,442 | 1,046 | 513 | 1,941 | 964 | 48 | 887 | 1,139 | 687 |
| Magistral Small 2509 (FP8) (easy) | 29,523,753 | 1654 | 15,213 | 2,943 | 1,012 | 1,327 | 2,337 | 603 | 1,303 | 512 | 1,280 | 798 | 1,405 | 768 | 925 |
| Magistral Small 2509 (FP8) (medium) | 56,711,835 | 2066 | 24,662 | 4,609 | 3,185 | 1,343 | 2,811 | 952 | 1,515 | 1,504 | 2,206 | 1,084 | 1,437 | 2,747 | 1,269 |
| Magistral Small 2509 (FP8) (hard) | 63,443,463 | 2343 | 25,503 | 4,305 | 2,930 | 771 | 2,902 | 1,331 | 1,498 | 2,016 | 2,582 | 1,087 | 1,375 | 3,659 | 1,047 |
| Llama-Nemotron-Super 49B v1.5 (INT8) (easy) | 31,021,098 | 2606 | 10,043 | 1,553 | 356 | 639 | 1,609 | 511 | 860 | 543 | 1,213 | 276 | 1,029 | 765 | 689 |
| Llama-Nemotron-Super 49B v1.5 (INT8) (medium) | 61,560,487 | 3205 | 15,991 | 2,061 | 1,435 | 341 | 1,937 | 764 | 1,329 | 1,375 | 1,866 | 492 | 1,118 | 2,172 | 1,101 |
| Llama-Nemotron-Super 49B v1.5 (INT8) (hard) | 71,234,100 | 3640 | 16,576 | 1,546 | 1,338 | 122 | 2,176 | 1,143 | 1,452 | 1,854 | 2,180 | 550 | 1,264 | 2,026 | 925 |
| GLM-4.5 Air (AWQ) (easy) | 36,747,440 | 2636 | 11,721 | 1,791 | 699 | 943 | 2,110 | 478 | 604 | 448 | 1,632 | 685 | 1,001 | 730 | 600 |
| GLM-4.5 Air (AWQ) (medium) | 68,265,410 | 3153 | 18,440 | 2,589 | 2,695 | 875 | 2,369 | 733 | 382 | 1,375 | 1,433 | 1,046 | 982 | 2,920 | 1,041 |
| GLM-4.5 Air (AWQ) (hard) | 78,140,372 | 3548 | 19,174 | 2,188 | 2,502 | 477 | 2,674 | 1,082 | 216 | 1,854 | 2,013 | 1,106 | 941 | 2,977 | 1,144 |
| Hunyuan A13B-Instruct (GPTQ) (easy) | 24,537,772 | 1431 | 16,067 | 2,190 | 880 | 1,547 | 2,239 | 571 | 1,363 | 512 | 2,304 | 827 | 1,594 | 1,184 | 856 |
| Hunyuan A13B-Instruct (GPTQ) (medium) | 53,254,183 | 1884 | 26,138 | 4,027 | 2,918 | 2,282 | 2,748 | 857 | 1,400 | 1,632 | 2,816 | 1,057 | 1,528 | 3,838 | 1,035 |
| Hunyuan A13B-Instruct (GPTQ) (hard) | 62,584,085 | 2249 | 24,903 | 4,040 | 2,517 | 1,627 | 2,895 | 1,364 | 871 | 2,208 | 2,102 | 1,016 | 1,370 | 4,169 | 724 |
| Qwen3-8B Original (FP16) (easy) | 35,111,857 | 2590 | 12,258 | 1,677 | 756 | 474 | 2,235 | 603 | 909 | 512 | 1,950 | 498 | 1,050 | 831 | 763 |
| Qwen3-8B Original (FP16) (medium) | 76,998,713 | 3260 | 20,779 | 3,014 | 2,954 | 419 | 2,584 | 890 | 1,417 | 1,568 | 2,557 | 573 | 1,050 | 2,543 | 1,210 |
| Qwen3-8B Original (FP16) (hard) | 95,954,003 | 3621 | 23,081 | 3,449 | 3,462 | 284 | 2,538 | 1,366 | 979 | 2,432 | 2,268 | 540 | 1,148 | 3,603 | 1,012 |
| Qwen3-30B-A3B Instruct-2507 (AWQ) (easy) | 20,178,612 | 1353 | 12,385 | 1,593 | 673 | 1,091 | 1,920 | 605 | 1,182 | 544 | 1,664 | 729 | 881 | 831 | 672 |
| Qwen3-30B-A3B Instruct-2507 (AWQ) (medium) | 40,127,418 | 1754 | 19,707 | 3,050 | 2,274 | 762 | 2,103 | 987 | 1,498 | 1,504 | 2,399 | 942 | 879 | 2,639 | 670 |
| Qwen3-30B-A3B Instruct-2507 (AWQ) (hard) | 50,186,835 | 2116 | 21,435 | 3,734 | 2,337 | 338 | 2,094 | 1,462 | 1,498 | 2,016 | 2,841 | 970 | 815 | 2,765 | 565 |
| Qwen3-4B Thinking-2507 (FP16) (easy) | 49,757,627 | 4384 | 10,065 | 1,840 | 433 | 427 | 1,695 | 500 | 626 | 511 | 1,557 | 412 | 828 | 671 | 565 |
| Qwen3-4B Thinking-2507 (FP16) (medium) | 82,503,525 | 5073 | 14,051 | 2,681 | 1,827 | 263 | 1,567 | 751 | 372 | 1,405 | 1,656 | 163 | 823 | 1,740 | 803 |
| Qwen3-4B Thinking-2507 (FP16) (hard) | 83,141,988 | 5415 | 12,500 | 2,091 | 1,514 | 80 | 1,430 | 1,116 | 174 | 1,913 | 1,527 | 144 | 719 | 1,187 | 605 |
| Qwen3-4B Original (FP16) (easy) | 39,463,143 | 2472 | 14,124 | 2,181 | 1,068 | 778 | 2,523 | 634 | 1,202 | 544 | 1,599 | 550 | 1,213 | 1,010 | 822 |
| Qwen3-4B Original (FP16) (medium) | 78,796,516 | 3151 | 22,477 | 3,555 | 3,031 | 686 | 2,808 | 947 | 1,475 | 1,536 | 2,411 | 396 | 1,458 | 3,142 | 1,032 |
| Qwen3-4B Original (FP16) (hard) | 89,893,549 | 3569 | 22,641 | 3,324 | 2,995 | 396 | 2,841 | 1,451 | 1,117 | 2,080 | 2,312 | 414 | 1,395 | 3,532 | 784 |
| Qwen3-4B Instruct-2507 (FP16) (easy) | 25,086,642 | 1456 | 15,716 | 2,797 | 1,037 | 853 | 2,157 | 633 | 1,213 | 512 | 1,888 | 895 | 1,624 | 1,119 | 988 |
| Qwen3-4B Instruct-2507 (FP16) (medium) | 49,710,158 | 1897 | 24,892 | 4,658 | 3,248 | 627 | 2,380 | 1,013 | 1,530 | 1,503 | 2,559 | 1,117 | 1,682 | 3,407 | 1,168 |
| Qwen3-4B Instruct-2507 (FP16) (hard) | 58,408,997 | 2331 | 25,285 | 4,592 | 3,085 | 197 | 2,329 | 1,521 | 1,353 | 2,015 | 2,783 | 1,149 | 1,660 | 3,636 | 965 |
| Qwen3-30B-A3B Thinking-2507 (AWQ) (easy) | 43,260,618 | 3568 | 10,331 | 1,267 | 436 | 683 | 1,508 | 476 | 919 | 415 | 1,437 | 719 | 971 | 744 | 756 |
| Qwen3-30B-A3B Thinking-2507 (AWQ) (medium) | 74,154,340 | 4301 | 14,124 | 2,048 | 1,510 | 184 | 1,457 | 729 | 704 | 1,370 | 1,585 | 390 | 1,032 | 2,001 | 1,114 |
| Qwen3-30B-A3B Thinking-2507 (AWQ) (hard) | 76,194,307 | 4667 | 13,035 | 1,887 | 1,396 | 61 | 1,257 | 1,077 | 230 | 1,912 | 1,297 | 254 | 985 | 1,756 | 923 |
| Nemotron Nano 9B v2 (FP16) (easy) | 26,431,618 | 1504 | 15,925 | 2,858 | 1,060 | 1,023 | 2,413 | 604 | 1,244 | 480 | 2,015 | 535 | 1,472 | 1,342 | 879 |
| Nemotron Nano 9B v2 (FP16) (medium) | 54,470,408 | 1978 | 25,088 | 4,325 | 3,089 | 985 | 2,846 | 986 | 1,491 | 1,408 | 2,486 | 784 | 1,787 | 3,967 | 934 |
| Nemotron Nano 9B v2 (FP16) (hard) | 61,480,100 | 2339 | 24,116 | 3,991 | 2,779 | 423 | 3,006 | 1,462 | 1,176 | 1,920 | 1,930 | 847 | 1,723 | 4,254 | 605 |
| Hermes-4 14B (FP8) (easy) | 31,012,323 | 1870 | 13,529 | 2,525 | 582 | 504 | 1,659 | 533 | 1,196 | 575 | 1,887 | 830 | 1,428 | 1,036 | 774 |
| Hermes-4 14B (FP8) (medium) | 61,401,273 | 2304 | 21,928 | 4,293 | 1,985 | 351 | 2,005 | 911 | 1,437 | 1,657 | 2,743 | 1,037 | 1,413 | 3,331 | 765 |
| Hermes-4 14B (FP8) (hard) | 68,721,936 | 2612 | 21,213 | 4,287 | 1,802 | 240 | 1,890 | 1,384 | 1,184 | 2,262 | 2,456 | 1,040 | 1,283 | 2,879 | 506 |
| R1-0528-Qwen3-8B (FP16) (easy) | 49,778,494 | 3087 | 13,892 | 2,024 | 642 | 1,003 | 2,032 | 555 | 942 | 480 | 2,164 | 512 | 1,686 | 991 | 861 |
| R1-0528-Qwen3-8B (FP16) (medium) | 83,511,661 | 3615 | 19,983 | 2,790 | 2,300 | 630 | 2,211 | 929 | 264 | 1,504 | 2,302 | 731 | 1,968 | 3,538 | 816 |
| R1-0528-Qwen3-8B (FP16) (hard) | 83,385,639 | 3887 | 18,647 | 2,065 | 2,069 | 201 | 2,251 | 1,358 | 221 | 2,047 | 1,429 | 816 | 1,758 | 3,803 | 629 |
| Hermes-4 70B (AWQ) (easy) | 30,019,066 | 1784 | 10,623 | 2,788 | 544 | 114 | 953 | 526 | 1,135 | 471 | 1,203 | 367 | 1,023 | 1,052 | 447 |
| Hermes-4 70B (AWQ) (medium) | 56,247,853 | 2273 | 16,716 | 3,811 | 1,775 | 82 | 1,174 | 834 | 1,165 | 1,448 | 1,658 | 501 | 998 | 2,712 | 558 |
| Hermes-4 70B (AWQ) (hard) | 63,248,929 | 2641 | 16,660 | 3,169 | 1,723 | 36 | 1,188 | 1,171 | 811 | 1,949 | 1,786 | 624 | 973 | 2,640 | 590 |
| Ring Mini 2.0 (FP16) (easy) | 44,576,866 | 3616 | 10,421 | 1,005 | 645 | 556 | 1,761 | 580 | 839 | 544 | 1,568 | 309 | 849 | 1,245 | 520 |
| Ring Mini 2.0 (FP16) (medium) | 85,774,200 | 4359 | 16,565 | 883 | 2,076 | 488 | 2,369 | 1,169 | 195 | 1,663 | 1,939 | 307 | 866 | 3,899 | 711 |
| Ring Mini 2.0 (FP16) (hard) | 83,837,586 | 4669 | 14,318 | 469 | 1,800 | 201 | 2,434 | 1,321 | 89 | 2,239 | 1,119 | 323 | 665 | 3,157 | 501 |
| aquif-3.5 A4B (FP16) (easy) | 38,124,382 | 3009 | 10,701 | 1,168 | 293 | 271 | 1,952 | 536 | 1,279 | 576 | 1,875 | 137 | 1,068 | 987 | 559 |
| aquif-3.5 A4B (FP16) (medium) | 68,910,321 | 3783 | 13,960 | 1,211 | 943 | 103 | 1,560 | 852 | 699 | 1,597 | 2,399 | 59 | 1,055 | 2,953 | 529 |
| aquif-3.5 A4B (FP16) (hard) | 68,980,479 | 4161 | 11,774 | 771 | 909 | 29 | 1,113 | 1,349 | 231 | 2,141 | 1,933 | 45 | 1,005 | 1,846 | 402 |
| Gemma3-27B-It (FP16) (easy) | 5,951,843 | 355 | 15,232 | 2,616 | 1,150 | 1,112 | 2,553 | 602 | 704 | 416 | 2,176 | 1,023 | 1,024 | 1,280 | 576 |
| Gemma3-27B-It (FP16) (medium) | 12,422,760 | 447 | 22,572 | 3,394 | 2,311 | 1,341 | 2,872 | 981 | 508 | 1,472 | 2,784 | 895 | 1,184 | 4,224 | 606 |
| Gemma3-27B-It (FP16) (hard) | 14,841,446 | 514 | 22,619 | 2,716 | 2,112 | 966 | 2,909 | 1,393 | 497 | 1,952 | 2,688 | 831 | 1,120 | 4,830 | 605 |
| Llama-3.3-70B (FP8) (easy) | 5,523,294 | 370 | 13,680 | 2,560 | 1,088 | 598 | 2,464 | 605 | 798 | 416 | 1,536 | 1,024 | 959 | 960 | 672 |
| Llama-3.3-70B (FP8) (medium) | 13,435,170 | 486 | 21,961 | 3,326 | 3,296 | 597 | 2,944 | 1,019 | 509 | 1,440 | 2,368 | 896 | 1,023 | 3,839 | 704 |
| Llama-3.3-70B (FP8) (hard) | 16,966,749 | 571 | 21,991 | 2,878 | 3,168 | 376 | 2,880 | 1,432 | 415 | 1,920 | 2,624 | 832 | 1,087 | 3,739 | 640 |
| Phi-4 (FP16) (easy) | 6,417,477 | 391 | 15,279 | 2,654 | 1,371 | 761 | 2,688 | 637 | 832 | 448 | 2,176 | 992 | 1,184 | 800 | 736 |
| Phi-4 (FP16) (medium) | 13,220,775 | 502 | 22,693 | 3,646 | 3,546 | 694 | 2,944 | 1,083 | 511 | 1,472 | 2,592 | 1,056 | 1,246 | 3,200 | 703 |
| Phi-4 (FP16) (hard) | 15,621,380 | 597 | 22,273 | 2,907 | 3,296 | 597 | 2,944 | 1,560 | 416 | 1,952 | 1,824 | 992 | 1,310 | 3,904 | 571 |
| Hunyuan 7B-Instruct (FP16) (easy) | 29,344,139 | 1907 | 13,729 | 414 | 1,000 | 561 | 2,187 | 858 | 1,465 | 472 | 2,496 | 857 | 1,608 | 864 | 947 |
| Hunyuan 7B-Instruct (FP16) (medium) | 54,216,234 | 2663 | 19,652 | 321 | 3,127 | 199 | 2,625 | 1,303 | 1,131 | 1,464 | 2,588 | 1,074 | 1,666 | 3,251 | 903 |
| Hunyuan 7B-Instruct (FP16) (hard) | 61,320,090 | 2998 | 19,147 | 185 | 2,908 | 96 | 2,657 | 1,778 | 778 | 1,995 | 1,905 | 1,103 | 1,463 | 3,742 | 537 |
| Gemma3-12B-It (FP16) (easy) | 6,299,479 | 357 | 15,121 | 2,329 | 1,292 | 639 | 2,609 | 478 | 768 | 512 | 2,656 | 894 | 1,152 | 1,184 | 608 |
| Gemma3-12B-It (FP16) (medium) | 13,264,347 | 463 | 21,569 | 2,593 | 3,091 | 863 | 2,778 | 892 | 508 | 1,600 | 2,912 | 540 | 1,184 | 3,936 | 672 |
| Gemma3-12B-It (FP16) (hard) | 15,543,276 | 552 | 21,629 | 2,079 | 2,781 | 863 | 2,808 | 1,369 | 406 | 2,112 | 2,336 | 414 | 1,120 | 4,701 | 640 |
| granite-4.0-h small (FP16) (easy) | 6,522,847 | 304 | 15,751 | 2,229 | 967 | 724 | 2,688 | 796 | 701 | 576 | 2,463 | 959 | 1,024 | 1,952 | 672 |
| granite-4.0-h small (FP16) (medium) | 12,541,567 | 377 | 21,364 | 2,675 | 1,758 | 799 | 2,912 | 1,274 | 536 | 1,760 | 2,298 | 767 | 1,246 | 4,669 | 670 |
| granite-4.0-h small (FP16) (hard) | 14,055,779 | 432 | 18,541 | 1,761 | 1,681 | 716 | 2,976 | 1,783 | 334 | 2,301 | 1,331 | 736 | 1,310 | 3,016 | 596 |
| Hunyuan 4B-Instruct (FP16) (easy) | 28,908,510 | 1852 | 14,138 | 953 | 370 | 1,402 | 2,643 | 694 | 1,329 | 480 | 2,431 | 153 | 1,781 | 1,183 | 719 |
| Hunyuan 4B-Instruct (FP16) (medium) | 51,396,935 | 2502 | 18,281 | 626 | 738 | 1,523 | 2,764 | 1,107 | 913 | 1,503 | 2,525 | 95 | 1,889 | 3,995 | 603 |
| Hunyuan 4B-Instruct (FP16) (hard) | 56,421,969 | 2961 | 16,623 | 326 | 761 | 592 | 2,672 | 1,611 | 560 | 2,013 | 1,629 | 86 | 1,764 | 4,172 | 437 |
| R1-Distill-Llama-8B (FP16) (easy) | 34,274,431 | 1693 | 17,190 | 1,686 | 1,382 | 1,006 | 2,592 | 952 | 1,362 | 574 | 2,176 | 873 | 2,065 | 1,742 | 780 |
| R1-Distill-Llama-8B (FP16) (medium) | 59,928,466 | 2223 | 22,121 | 1,946 | 3,078 | 537 | 2,743 | 1,429 | 773 | 1,728 | 1,568 | 734 | 2,223 | 4,672 | 690 |
| R1-Distill-Llama-8B (FP16) (hard) | 59,798,754 | 2477 | 20,347 | 1,642 | 2,610 | 321 | 2,560 | 1,904 | 467 | 2,336 | 1,107 | 717 | 2,040 | 4,133 | 510 |
| Qwen3-1.7B (AWQ) (easy) | 46,900,595 | 2516 | 16,747 | 2,270 | 1,205 | 551 | 2,553 | 889 | 1,095 | 576 | 2,303 | 1,095 | 1,561 | 1,917 | 732 |
| Qwen3-1.7B (AWQ) (medium) | 74,092,131 | 3205 | 20,765 | 2,993 | 1,824 | 276 | 2,980 | 1,392 | 407 | 1,728 | 1,662 | 1,019 | 1,687 | 4,240 | 557 |
| Qwen3-1.7B (AWQ) (hard) | 76,506,761 | 3621 | 19,411 | 1,955 | 1,734 | 139 | 2,841 | 1,868 | 261 | 2,304 | 1,009 | 1,006 | 1,560 | 4,343 | 391 |
| ERNIE-4.5-21B-A3B Thinking (AWQ) (easy) | 51,617,179 | 3465 | 12,160 | 1,295 | 881 | 372 | 2,161 | 597 | 873 | 572 | 2,089 | 380 | 707 | 1,567 | 666 |
| ERNIE-4.5-21B-A3B Thinking (AWQ) (medium) | 80,978,436 | 4119 | 15,210 | 1,370 | 2,779 | 293 | 2,052 | 968 | 186 | 1,712 | 2,089 | 176 | 599 | 2,461 | 525 |
| ERNIE-4.5-21B-A3B Thinking (AWQ) (hard) | 76,649,834 | 4451 | 12,946 | 831 | 2,567 | 173 | 1,944 | 1,435 | 170 | 2,310 | 1,116 | 122 | 584 | 1,263 | 431 |
| Phi-4 Mini Reasoning (FP16) (easy) | 54,232,057 | 3078 | 14,642 | 2,385 | 1,096 | 88 | 2,524 | 717 | 1,275 | 625 | 2,472 | 297 | 1,143 | 1,351 | 669 |
| Phi-4 Mini Reasoning (FP16) (medium) | 80,513,695 | 3669 | 16,630 | 3,237 | 2,083 | 32 | 1,778 | 1,092 | 786 | 1,498 | 1,887 | 265 | 1,107 | 2,417 | 448 |
| Phi-4 Mini Reasoning (FP16) (hard) | 72,414,552 | 4035 | 12,947 | 2,346 | 1,914 | 19 | 1,062 | 1,564 | 440 | 2,015 | 1,004 | 286 | 1,052 | 965 | 280 |
| SmolLM3 3B (FP16) (easy) | 32,781,845 | 1633 | 14,538 | 2,104 | 1,069 | 253 | 1,272 | 968 | 1,282 | 565 | 1,907 | 831 | 1,662 | 1,914 | 711 |
| SmolLM3 3B (FP16) (medium) | 55,699,395 | 2086 | 18,762 | 2,540 | 2,244 | 196 | 1,674 | 1,449 | 772 | 1,772 | 1,284 | 928 | 1,787 | 3,601 | 515 |
| SmolLM3 3B (FP16) (hard) | 53,593,811 | 2388 | 16,147 | 1,916 | 2,062 | 90 | 1,678 | 1,907 | 548 | 2,378 | 867 | 893 | 1,699 | 1,805 | 304 |
| Gemma3-4B-It (FP16) (easy) | 6,895,460 | 348 | 15,059 | 2,027 | 877 | 667 | 2,639 | 795 | 672 | 608 | 2,619 | 447 | 1,184 | 2,015 | 509 |
| Gemma3-4B-It (FP16) (medium) | 12,829,122 | 472 | 20,453 | 2,067 | 1,968 | 844 | 2,996 | 1,294 | 384 | 1,824 | 2,431 | 287 | 1,280 | 4,640 | 438 |
| Gemma3-4B-It (FP16) (hard) | 13,098,828 | 547 | 18,632 | 1,520 | 1,889 | 839 | 2,871 | 1,803 | 448 | 2,432 | 1,568 | 319 | 1,344 | 3,200 | 399 |
| granite-3.3 8B Instruct (FP16) (easy) | 9,376,264 | 593 | 16,089 | 1,632 | 1,532 | 787 | 2,752 | 923 | 671 | 544 | 2,368 | 561 | 1,663 | 2,016 | 640 |
| granite-3.3 8B Instruct (FP16) (medium) | 15,560,912 | 719 | 20,396 | 1,501 | 3,376 | 767 | 2,848 | 1,428 | 445 | 1,728 | 1,632 | 308 | 1,662 | 4,126 | 575 |
| granite-3.3 8B Instruct (FP16) (hard) | 16,371,771 | 771 | 18,390 | 1,240 | 3,083 | 517 | 2,848 | 1,874 | 377 | 2,368 | 1,054 | 305 | 1,342 | 2,936 | 446 |
| Llama-3.1-Nemotron-Nano-4B-v1.1 (FP16) (easy) | 41,113,288 | 3125 | 9,604 | 1,902 | 199 | 165 | 757 | 954 | 1,038 | 542 | 2,210 | 155 | 686 | 653 | 343 |
| Llama-3.1-Nemotron-Nano-4B-v1.1 (FP16) (medium) | 62,633,572 | 3994 | 9,798 | 1,902 | 755 | 46 | 729 | 1,448 | 291 | 1,480 | 1,483 | 101 | 678 | 669 | 216 |
| Llama-3.1-Nemotron-Nano-4B-v1.1 (FP16) (hard) | 64,379,784 | 4491 | 8,493 | 1,088 | 776 | 13 | 596 | 1,918 | 144 | 2,011 | 765 | 68 | 700 | 267 | 147 |
| Llama-3.1-Nemotron-Nano-8B (FP16) (easy) | 52,725,257 | 1206 | 22,010 | 4,321 | 582 | 997 | 2,558 | 1,102 | 1,214 | 999 | 3,559 | 151 | 3,656 | 2,259 | 612 |
| Llama-3.1-Nemotron-Nano-8B (FP16) (medium) | 53,888,942 | 1563 | 18,322 | 3,565 | 2,042 | 699 | 2,121 | 814 | 448 | 1,414 | 1,937 | 106 | 2,125 | 2,802 | 249 |
| Llama-3.1-Nemotron-Nano-8B (FP16) (hard) | 51,786,393 | 1725 | 15,900 | 2,735 | 1,698 | 595 | 2,036 | 1,189 | 384 | 1,889 | 1,126 | 93 | 2,076 | 1,865 | 214 |
| AI21 Jamba Reasoning 3B (FP16) (easy) | 49,040,340 | 3090 | 11,600 | 1,299 | 608 | 451 | 2,158 | 811 | 500 | 410 | 1,714 | 540 | 1,229 | 1,509 | 371 |
| AI21 Jamba Reasoning 3B (FP16) (medium) | 76,612,547 | 3877 | 17,259 | 1,700 | 2,838 | 517 | 2,826 | 1,250 | 314 | 1,409 | 1,465 | 469 | 1,251 | 2,850 | 370 |
| AI21 Jamba Reasoning 3B (FP16) (hard) | 76,016,642 | 4237 | 16,735 | 1,381 | 2,876 | 395 | 2,943 | 1,754 | 286 | 2,035 | 993 | 488 | 1,288 | 1,943 | 353 |
| granite-4.0-h tiny (FP16) (easy) | 7,369,188 | 207 | 13,679 | 1,601 | 715 | 391 | 2,432 | 985 | 480 | 640 | 1,949 | 503 | 1,408 | 1,952 | 623 |
| granite-4.0-h tiny (FP16) (medium) | 15,630,823 | 266 | 14,540 | 1,184 | 1,443 | 195 | 2,783 | 1,492 | 383 | 1,568 | 1,055 | 309 | 1,566 | 2,079 | 483 |
| granite-4.0-h tiny (FP16) (hard) | 19,347,688 | 298 | 13,820 | 810 | 1,417 | 63 | 3,008 | 2,000 | 384 | 2,047 | 647 | 309 | 1,374 | 1,427 | 334 |
| granite-4.0-h micro (FP16) (easy) | 4,560,449 | 256 | 14,168 | 1,703 | 1,476 | 607 | 2,912 | 986 | 639 | 640 | 765 | 672 | 1,438 | 1,888 | 442 |
| granite-4.0-h micro (FP16) (medium) | 7,731,848 | 336 | 17,259 | 1,540 | 3,159 | 895 | 2,880 | 1,431 | 470 | 1,824 | 760 | 320 | 1,278 | 2,304 | 398 |
| granite-4.0-h micro (FP16) (hard) | 8,598,854 | 383 | 16,392 | 1,255 | 2,802 | 895 | 2,816 | 1,877 | 374 | 2,432 | 746 | 288 | 1,022 | 1,533 | 352 |
| Llama-3.1-8B (FP16) (easy) | 13,071,823 | 366 | 13,986 | 1,365 | 591 | 200 | 2,910 | 956 | 857 | 534 | 2,622 | 353 | 1,113 | 1,950 | 535 |
| Llama-3.1-8B (FP16) (medium) | 21,715,813 | 512 | 15,059 | 1,326 | 1,266 | 156 | 2,811 | 1,204 | 462 | 1,772 | 1,977 | 190 | 1,195 | 2,099 | 601 |
| Llama-3.1-8B (FP16) (hard) | 24,165,191 | 583 | 13,839 | 1,059 | 1,219 | 85 | 2,750 | 1,681 | 365 | 2,376 | 1,178 | 179 | 1,105 | 1,412 | 430 |
| Llama-3.2-3B (FP16) (easy) | 8,164,023 | 334 | 12,588 | 1,161 | 803 | 317 | 2,823 | 972 | 373 | 633 | 2,432 | 188 | 920 | 1,625 | 341 |
| Llama-3.2-3B (FP16) (medium) | 14,517,297 | 382 | 14,312 | 1,269 | 1,003 | 308 | 2,829 | 1,391 | 365 | 1,784 | 1,951 | 187 | 982 | 1,892 | 351 |
| Llama-3.2-3B (FP16) (hard) | 16,644,490 | 429 | 13,613 | 1,034 | 961 | 273 | 2,769 | 1,797 | 373 | 2,379 | 1,116 | 186 | 950 | 1,469 | 306 |
| Phi-4 Mini Flash Reasoning (FP16) (easy) | 31,116,228 | 1963 | 5,679 | 698 | 45 | 87 | 620 | 832 | 445 | 94 | 1,387 | 113 | 1,065 | 157 | 136 |
Overall Totals¶
Unique Models: 53
Total Tokens (All Models): 7,109,826,885
Total Tests (All Models): 2,646,483
Evaluation Workflows¶
Rapid Assessment (2-3 hours)¶
python runner.py --config configs/m12x.yaml --degree 0 --density normal --precision low
Standard Evaluation (8-12 hours)¶
python runner.py --config configs/m12x.yaml --degree 1 --density normal --precision medium
Research-Grade Analysis (20+ hours)¶
python runner.py --config configs/m12x.yaml --degree 2 --density normal --precision high
Citation¶
When using M12X in research, please cite:
@software{reasonscape_m12x2025,
title={M12X: Comprehensive 12-Domain Reasoning Evaluation},
author={Mikhail Ravkine},
year={2025},
url={https://github.com/the-crypt-keeper/reasonscape},
note={Part of ReasonScape evaluation methodology}
}
See Also¶
- Configuration Guide: Detailed parameter reference
- Task Documentation: Individual domain specifications
- Statistical Methodology: Technical implementation details