Question 1

Is Devstral Small 1.1 better than GPT-5 Mini?

Accepted Answer

For most tasks, no. In our testing, GPT-5 Mini wins 10 of 12 benchmarks — including large gaps on strategic analysis (5 vs 2), agentic planning (4 vs 2), and persona consistency (5 vs 2). Devstral Small 1.1 wins only one benchmark outright: tool calling (4 vs 3, where GPT-5 Mini ranks 47 of 54). If tool-calling accuracy is your primary concern, Devstral Small 1.1 is the better choice. For everything else, GPT-5 Mini scores higher across our test suite.

Question 2

Which is cheaper — Devstral Small 1.1 or GPT-5 Mini?

Accepted Answer

Devstral Small 1.1 is significantly cheaper: $0.10/MTok input and $0.30/MTok output, versus GPT-5 Mini's $0.25/MTok input and $2.00/MTok output. On output tokens — typically the larger cost driver — Devstral Small 1.1 is 6.7x cheaper. At 10M output tokens/month that's $3.00 vs $20.00; at 100M tokens/month it's $300 vs $2,000. Note that GPT-5 Mini uses reasoning tokens, which can add to its effective token count on complex requests.

Question 3

Which is better for coding?

Accepted Answer

Devstral Small 1.1 is described as purpose-built for software engineering agents, developed in collaboration with All Hands AI and finetuned for agentic code tasks. On tool calling — critical for code execution pipelines — it scores 4/5 (rank 18 of 54) vs GPT-5 Mini's 3/5 (rank 47 of 54). On SWE-bench Verified (Epoch AI), GPT-5 Mini scores 64.7% (rank 8 of 12 models with scores in our dataset). Devstral Small 1.1 has no external benchmark data in our payload for direct comparison. For agentic planning relevant to multi-step code tasks, GPT-5 Mini scores 4/5 vs Devstral Small 1.1's 2/5 (rank 53 of 54).

Question 4

Which handles longer documents better?

Accepted Answer

GPT-5 Mini has a clear advantage. It supports a 400K token context window vs Devstral Small 1.1's 131K — more than 3x the capacity. On our long context benchmark (retrieval accuracy at 30K+ tokens), GPT-5 Mini scores 5/5 (tied for 1st among 55 models) vs Devstral Small 1.1's 4/5 (rank 38 of 55). For large codebase analysis, lengthy document review, or extended conversation history, GPT-5 Mini is the stronger choice on both raw capacity and benchmark performance.

Question 5

Does GPT-5 Mini support images and files while Devstral Small 1.1 does not?

Accepted Answer

Yes, based on the payload data. GPT-5 Mini's listed modality is text+image+file->text, meaning it accepts images and files as input. Devstral Small 1.1's modality is text->text only. If your workflow involves processing screenshots, PDFs, or other non-text inputs, GPT-5 Mini is the only option of the two.

Question 6

How does GPT-5 Mini's math performance compare?

Accepted Answer

GPT-5 Mini has strong math benchmark results per Epoch AI data. It scores 97.8% on MATH Level 5 (rank 2 of 14 models with scores in our dataset, above the dataset median of 94.15%) and 86.7% on AIME 2025 (rank 9 of 23, near the dataset median of 83.9%). No equivalent external benchmark data for Devstral Small 1.1 is available in our payload. These scores suggest GPT-5 Mini is a competitive choice for quantitative and mathematical reasoning tasks.

Devstral Small 1.1 vs GPT-5 Mini

Devstral Small 1.1

GPT-5 Mini

Benchmark Analysis

Pricing Analysis

Real-World Cost Comparison

Bottom Line

How We Test

Frequently Asked Questions