Question 1

Is Devstral Medium better than GPT-5 Mini?

Accepted Answer

No — in our 12-test suite GPT-5 Mini wins 9 tests, Devstral Medium wins 0 and ties 3 (tool_calling, classification, agentic_planning). GPT-5 Mini beats Devstral on structured_output (5 vs 4), long_context (5 vs 4), faithfulness (5 vs 4) and multiple other areas.

Question 2

Which model is cheaper to run?

Accepted Answer

GPT-5 Mini has lower input cost ($0.25 per 1k tokens) vs Devstral Medium ($0.40 per 1k). With a 50/50 input/output token split that equals $1,125 per 1M tokens for GPT-5 Mini vs $1,200 for Devstral Medium — a $75 savings per 1M tokens.

Question 3

Which is better for coding or structured outputs?

Accepted Answer

GPT-5 Mini: structured_output 5 vs Devstral Medium 4. GPT-5 Mini is tied for 1st on structured_output among 54 models in our rankings, meaning it produces more reliable JSON/schema-compliant outputs in our tests.

Question 4

Which model handles long documents better?

Accepted Answer

GPT-5 Mini scored 5/5 and is tied for 1st on long_context (400,000 token window listed), while Devstral Medium scored 4/5 with a 131,072 token window. For 30K+ retrieval and very large contexts, GPT-5 Mini performs better in our benchmarks.

Question 5

How do third‑party benchmarks compare?

Accepted Answer

GPT-5 Mini has external scores reported by Epoch AI: SWE-bench Verified 64.7%, MATH Level 5 97.8%, and AIME 2025 86.7%. These external results support GPT-5 Mini's strong math and coding-related capabilities; Devstral Medium has no external benchmark entries in the payload.

Question 6

Are there any parameter or API differences I should know?

Accepted Answer

Payload shows Devstral Medium exposes parameters like frequency_penalty, temperature, top_p, and structured_outputs. GPT-5 Mini lists parameters including include_reasoning and reasoning and has a quirk: it uses reasoning tokens. Choose based on which parameter set matches your integration needs.

Devstral Medium vs GPT-5 Mini

Devstral Medium

GPT-5 Mini

Benchmark Analysis

Pricing Analysis

Real-World Cost Comparison

Bottom Line

How We Test

Frequently Asked Questions