/best/mathupdated May 202613 models evaluated

Best AI for math

Arithmetic, proofs, optimization, and symbolic reasoning.

CodingMathWritingResearchTranslationData AnalysisChatbotsStudentsBusinessCreative WritingTabular Data & Spreadsheets

Math benchmarks are the cleanest signal we have. Answers are verifiable, contamination is detectable, and reasoning-RL fine-tuning produces outsized gains — which is why thinking-style models (o-series, R1, Gemini thinking variants) dominate.

What matters: AIME, MATH Level 5, and GPQA-Diamond. If your application does anything quantitative — finance, logistics, scientific computing — pay the premium for a reasoning model. The gap between a frontier reasoning model and a frontier chat model on hard math problems can be 20+ percentage points.

Our math rank weights the math benchmark (2.5×), reasoning (1.5×), and structured output (0.5×). The structured output weight matters for applications that pipe results into downstream systems — a model that gets the right answer but formats it wrong is still broken.

Full rankings

All 13 models, scored for math

weighted composite · lower-is-worse
#ModelProviderTask score$/in$/outContext
01GPT-5OOpenAI98.1%$1.25$10.00400K
02GPT-5 MiniOOpenAI97.8%$0.250$2.00400K
03o4 MiniOOpenAI97.8%$1.10$4.40200K
04o3OOpenAI97.8%$2.00$8.00200K
05R1 0528DDeepSeek96.6%$0.500$2.15164K
06GPT-5 NanoOOpenAI95.2%$0.050$0.400400K
07R1DDeepSeek93.1%$0.700$2.50164K
08GPT-4.1 MiniOOpenAI87.3%$0.400$1.601.0M
09GPT-4.1OOpenAI83.0%$2.00$8.001.0M
10GPT-4.1 NanoOOpenAI70.0%$0.100$0.4001.0M
11GPT-4oOOpenAI53.3%$2.50$10.00128K
12GPT-4o-miniOOpenAI52.6%$0.150$0.600128K
13Llama 3.3 70B InstructMMeta41.6%$0.100$0.320131K

Pricing — top 5 for math

OGPT-5
$7.81/MTok
98.1%
OGPT-5 Mini
$1.56/MTok
97.8%
Oo4 Mini
$3.58/MTok
97.8%
Oo3
$6.50/MTok
97.8%
DR1 0528
$1.74/MTok
96.6%
modelpicker.aipowered by live benchmark data

The best AI for math changes every month.

We'll email you when rankings shift, new models hit the top 5, or pricing cuts reshuffle the value leaders.

Get notified when models change
Price drops, new models, benchmark updates. One email per change, no spam.