Is Devstral Medium better than GPT-5 Nano?

No—GPT-5 Nano wins 8 of 12 benchmarks in our tests while Devstral Medium wins 1 (classification). The models tie on constrained rewriting, faithfulness, and agentic planning.

Which model is cheaper to run?

GPT-5 Nano is substantially cheaper: $0.05 per mTok (input) and $0.40 per mTok (output) vs Devstral Medium's $0.40 input and $2.00 output.

How much would switching to GPT-5 Nano save at 10M tokens/month?

Assuming a 50/50 split of input/output tokens, Devstral would cost about $12,000/month vs GPT-5 Nano at $2,250/month — roughly $9,750 saved per month.

Which model is better for coding and developer workflows?

GPT-5 Nano scored 4/5 on tool calling vs Devstral's 3/5 and also leads on long-context and structured output—metrics important for developer tooling in our tests. Devstral’s description does market it for code generation, but on our measured tool-calling and long-context tasks GPT-5 Nano performed better.

Which model is safer for production?

GPT-5 Nano scored 4/5 on safety_calibration vs Devstral Medium's 1/5 and ranks 6 of 55 on safety in our tests, indicating stronger refusal/permissive decisions in our suite.

Does GPT-5 Nano do well on math?

Yes—GPT-5 Nano scores 95.2% on MATH Level 5 and 81.1% on AIME 2025 according to Epoch AI, providing external support for its math capabilities beyond our internal scores.

Devstral Medium vs GPT-5 Nano

GPT-5 Nano is the practical pick for most developers and teams: it wins 8 of 12 benchmarks in our tests and costs far less per token. Devstral Medium only wins classification in our suite and may still appeal if you prioritize the model's marketed code-generation positioning despite a higher price.

mistral

Devstral Medium

Overall

3.17/5Usable

Benchmark Scores

Faithfulness

4/5

Long Context

4/5

Multilingual

4/5

Tool Calling

3/5

Classification

4/5

Agentic Planning

4/5

Structured Output

4/5

Safety Calibration

1/5

Strategic Analysis

2/5

Persona Consistency

3/5

Constrained Rewriting

3/5

Creative Problem Solving

2/5

External Benchmarks

SWE-bench Verified

N/A

MATH Level 5

N/A

AIME 2025

N/A

Pricing

Input

$0.400/MTok

Output

$2.00/MTok

Context Window131K

modelpicker.net

openai

GPT-5 Nano

Overall

4.00/5Strong

Benchmark Scores

Faithfulness

4/5

Long Context

5/5

Multilingual

5/5

Tool Calling

4/5

Classification

3/5

Agentic Planning

4/5

Structured Output

5/5

Safety Calibration

4/5

Strategic Analysis

4/5

Persona Consistency

4/5

Constrained Rewriting

3/5

Creative Problem Solving

3/5

External Benchmarks

SWE-bench Verified

N/A

MATH Level 5

95.2%

AIME 2025

81.1%

Pricing

Input

$0.050/MTok

Output

$0.400/MTok

Context Window400K

modelpicker.net

Benchmark Analysis

Across our 12-test suite, GPT-5 Nano wins the majority (8 metrics) while Devstral Medium wins 1; three tests tie. Metric-by-metric (score A = Devstral, B = GPT-5 Nano): - Structured output (JSON/schema): 4 vs 5 — GPT-5 Nano wins and is tied for 1st on this task (tied with 24 others), meaning Nano is more reliable for strict schema compliance. - Strategic analysis (tradeoff math): 2 vs 4 — GPT-5 Nano wins (rank 27/54) so it handles nuanced numeric tradeoffs better in our tests. - Creative problem solving: 2 vs 3 — GPT-5 Nano wins, indicating better generation of non-obvious, actionable ideas. - Tool calling: 3 vs 4 — GPT-5 Nano wins and ranks 18 of 54, so it selects functions and arguments more accurately in our tool-calling scenarios. - Long context (30K+): 4 vs 5 — GPT-5 Nano wins and is tied for 1st on long-context retrieval, so it performs better on long-document tasks. - Safety calibration: 1 vs 4 — GPT-5 Nano strongly wins (rank 6 of 55), refusing harmful requests more reliably in our tests. - Persona consistency: 3 vs 4 — GPT-5 Nano wins, maintaining character better in our prompts. - Multilingual: 4 vs 5 — GPT-5 Nano wins and is tied for 1st, producing higher-quality non-English outputs in our evaluation. - Classification: 4 vs 3 — Devstral Medium wins here and is tied for 1st among 29 other models, so it’s a strong router/categorizer in our suite. - Constrained rewriting, Faithfulness, Agentic planning: ties (3/4/4 respectively across models) — neither model dominates on these. External benchmarks (supplementary): GPT-5 Nano scores 95.2% on MATH Level 5 and 81.1% on AIME 2025 (Epoch AI), which supports its strength on advanced math tasks. In short: Nano is stronger for structured outputs, long-context, tool workflows, multilingual and safety; Devstral's clear win is classification accuracy in our tests.

BenchmarkDevstral MediumGPT-5 Nano

Faithfulness4/54/5

Long Context4/55/5

Multilingual4/55/5

Tool Calling3/54/5

Classification4/53/5

Agentic Planning4/54/5

Structured Output4/55/5

Safety Calibration1/54/5

Strategic Analysis2/54/5

Persona Consistency3/54/5

Constrained Rewriting3/53/5

Creative Problem Solving2/53/5

Summary1 wins8 wins

Pricing Analysis

Raw per-mTok prices: Devstral Medium charges $0.40 input / $2.00 output; GPT-5 Nano charges $0.05 input / $0.40 output. At realistic volumes assuming a 50/50 split of input/output tokens: - 1M total tokens (500k input + 500k output): Devstral = $1,200; GPT-5 Nano = $225. - 10M total tokens: Devstral = $12,000; GPT-5 Nano = $2,250. - 100M total tokens: Devstral = $120,000; GPT-5 Nano = $22,500. If you instead measure costs per million input tokens: Devstral = $400 / M (input) and $2,000 / M (output); GPT-5 Nano = $50 / M (input) and $400 / M (output). Teams doing high-volume inference (10M+ tokens/month) will see five- to ten-fold savings with GPT-5 Nano and should care about the gap; small-scale prototypes may tolerate Devstral's premium if they value its marketed strengths, but expect much higher monthly bills with Devstral Medium.

Real-World Cost Comparison

TaskDevstral MediumGPT-5 Nano

iChat response$0.0011<$0.001

iBlog post$0.0042<$0.001

iDocument batch$0.108$0.021

iPipeline run$1.08$0.210

Bottom Line

Choose Devstral Medium if: - Your primary need is top-tier classification/routing (Devstral scores 4/5 and is tied for 1st on classification in our tests) AND you can absorb much higher costs ($0.40 in / $2.00 out per mTok). Choose GPT-5 Nano if: - You need reliable structured outputs, long-context understanding, tool-calling, multilingual performance, or stronger safety (GPT-5 Nano wins those categories in our 12-test suite) AND you want dramatically lower token costs ($0.05 in / $0.40 out per mTok). For high-volume production (10M+ tokens/month) GPT-5 Nano is the cost-effective winner in most real tasks we measured.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.