Devstral Medium vs GPT-5 Nano

GPT-5 Nano is the practical pick for most developers and teams: it wins 8 of 12 benchmarks in our tests and costs far less per token. Devstral Medium only wins classification in our suite and may still appeal if you prioritize the model's marketed code-generation positioning despite a higher price.

mistral

Devstral Medium

Overall
3.17/5Usable

Benchmark Scores

Faithfulness
4/5
Long Context
4/5
Multilingual
4/5
Tool Calling
3/5
Classification
4/5
Agentic Planning
4/5
Structured Output
4/5
Safety Calibration
1/5
Strategic Analysis
2/5
Persona Consistency
3/5
Constrained Rewriting
3/5
Creative Problem Solving
2/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$0.400/MTok

Output

$2.00/MTok

Context Window131K

modelpicker.net

openai

GPT-5 Nano

Overall
4.00/5Strong

Benchmark Scores

Faithfulness
4/5
Long Context
5/5
Multilingual
5/5
Tool Calling
4/5
Classification
3/5
Agentic Planning
4/5
Structured Output
5/5
Safety Calibration
4/5
Strategic Analysis
4/5
Persona Consistency
4/5
Constrained Rewriting
3/5
Creative Problem Solving
3/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
95.2%
AIME 2025
81.1%

Pricing

Input

$0.050/MTok

Output

$0.400/MTok

Context Window400K

modelpicker.net

Benchmark Analysis

Across our 12-test suite, GPT-5 Nano wins the majority (8 metrics) while Devstral Medium wins 1; three tests tie. Metric-by-metric (score A = Devstral, B = GPT-5 Nano): - Structured output (JSON/schema): 4 vs 5 — GPT-5 Nano wins and is tied for 1st on this task (tied with 24 others), meaning Nano is more reliable for strict schema compliance. - Strategic analysis (tradeoff math): 2 vs 4 — GPT-5 Nano wins (rank 27/54) so it handles nuanced numeric tradeoffs better in our tests. - Creative problem solving: 2 vs 3 — GPT-5 Nano wins, indicating better generation of non-obvious, actionable ideas. - Tool calling: 3 vs 4 — GPT-5 Nano wins and ranks 18 of 54, so it selects functions and arguments more accurately in our tool-calling scenarios. - Long context (30K+): 4 vs 5 — GPT-5 Nano wins and is tied for 1st on long-context retrieval, so it performs better on long-document tasks. - Safety calibration: 1 vs 4 — GPT-5 Nano strongly wins (rank 6 of 55), refusing harmful requests more reliably in our tests. - Persona consistency: 3 vs 4 — GPT-5 Nano wins, maintaining character better in our prompts. - Multilingual: 4 vs 5 — GPT-5 Nano wins and is tied for 1st, producing higher-quality non-English outputs in our evaluation. - Classification: 4 vs 3 — Devstral Medium wins here and is tied for 1st among 29 other models, so it’s a strong router/categorizer in our suite. - Constrained rewriting, Faithfulness, Agentic planning: ties (3/4/4 respectively across models) — neither model dominates on these. External benchmarks (supplementary): GPT-5 Nano scores 95.2% on MATH Level 5 and 81.1% on AIME 2025 (Epoch AI), which supports its strength on advanced math tasks. In short: Nano is stronger for structured outputs, long-context, tool workflows, multilingual and safety; Devstral's clear win is classification accuracy in our tests.

BenchmarkDevstral MediumGPT-5 Nano
Faithfulness4/54/5
Long Context4/55/5
Multilingual4/55/5
Tool Calling3/54/5
Classification4/53/5
Agentic Planning4/54/5
Structured Output4/55/5
Safety Calibration1/54/5
Strategic Analysis2/54/5
Persona Consistency3/54/5
Constrained Rewriting3/53/5
Creative Problem Solving2/53/5
Summary1 wins8 wins

Pricing Analysis

Raw per-mTok prices: Devstral Medium charges $0.40 input / $2.00 output; GPT-5 Nano charges $0.05 input / $0.40 output. At realistic volumes assuming a 50/50 split of input/output tokens: - 1M total tokens (500k input + 500k output): Devstral = $1,200; GPT-5 Nano = $225. - 10M total tokens: Devstral = $12,000; GPT-5 Nano = $2,250. - 100M total tokens: Devstral = $120,000; GPT-5 Nano = $22,500. If you instead measure costs per million input tokens: Devstral = $400 / M (input) and $2,000 / M (output); GPT-5 Nano = $50 / M (input) and $400 / M (output). Teams doing high-volume inference (10M+ tokens/month) will see five- to ten-fold savings with GPT-5 Nano and should care about the gap; small-scale prototypes may tolerate Devstral's premium if they value its marketed strengths, but expect much higher monthly bills with Devstral Medium.

Real-World Cost Comparison

TaskDevstral MediumGPT-5 Nano
iChat response$0.0011<$0.001
iBlog post$0.0042<$0.001
iDocument batch$0.108$0.021
iPipeline run$1.08$0.210

Bottom Line

Choose Devstral Medium if: - Your primary need is top-tier classification/routing (Devstral scores 4/5 and is tied for 1st on classification in our tests) AND you can absorb much higher costs ($0.40 in / $2.00 out per mTok). Choose GPT-5 Nano if: - You need reliable structured outputs, long-context understanding, tool-calling, multilingual performance, or stronger safety (GPT-5 Nano wins those categories in our 12-test suite) AND you want dramatically lower token costs ($0.05 in / $0.40 out per mTok). For high-volume production (10M+ tokens/month) GPT-5 Nano is the cost-effective winner in most real tasks we measured.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Frequently Asked Questions