Changelog

4/7/2026

Ministral 3 3B (2.25 avg Usable) ties GPT-5 Nano at 1/4 price, becomes new Budget leader and editor. M8B incomplete (3/4 tests, OpenRouter routing issues).

Tested: ministral-3b-2512, ministral-8b-2512, gpt-5-nano

4/6/2026

Mistral Small 4 scores Strong 2.50 (2/2/3/3). Grok 4.1 Fast re-tested with 4-test suite: dropped from Strong 3.00 to Usable 2.25 — failed 800-char constraint on T4 (949 chars). MS4 passed (779 chars). New Value bracket leader. No retirements.

Tested: Mistral Small 4, Grok 4.1 Fast

4/4/2026

Ministral 3 14B scored Usable 2.00 avg at $0.20/MTok output. Consistent 2/3 across all 4 tests. GPT-5 Nano remains Budget leader (2.25 avg). GPT-5 Nano comparison skipped due to OpenRouter timeouts — used existing 04-03 scores. No role changes, no retirements.

Tested: Ministral 3 14B

4/3/2026

Event-driven run. 1 new model benchmarked (Mistral Small 3.2), 1 comparison (GPT-5 Nano). GPT-5 Nano avg improved from 2.00 to 2.25 with constrained_rewriting test added. No role changes. DeepSeek V4 deferred until available on OpenRouter.

Tested: Mistral Small 3.2, GPT-5 Nano