Changelog
4/7/2026
Ministral 3 3B (2.25 avg Usable) ties GPT-5 Nano at 1/4 price, becomes new Budget leader and editor. M8B incomplete (3/4 tests, OpenRouter routing issues).
Tested: ministral-3b-2512, ministral-8b-2512, gpt-5-nano
4/6/2026
Mistral Small 4 scores Strong 2.50 (2/2/3/3). Grok 4.1 Fast re-tested with 4-test suite: dropped from Strong 3.00 to Usable 2.25 — failed 800-char constraint on T4 (949 chars). MS4 passed (779 chars). New Value bracket leader. No retirements.
Tested: Mistral Small 4, Grok 4.1 Fast
4/4/2026
Ministral 3 14B scored Usable 2.00 avg at $0.20/MTok output. Consistent 2/3 across all 4 tests. GPT-5 Nano remains Budget leader (2.25 avg). GPT-5 Nano comparison skipped due to OpenRouter timeouts — used existing 04-03 scores. No role changes, no retirements.
Tested: Ministral 3 14B
4/3/2026
Event-driven run. 1 new model benchmarked (Mistral Small 3.2), 1 comparison (GPT-5 Nano). GPT-5 Nano avg improved from 2.00 to 2.25 with constrained_rewriting test added. No role changes. DeepSeek V4 deferred until available on OpenRouter.
Tested: Mistral Small 3.2, GPT-5 Nano