Question 1

Is Devstral Small 1.1 better than GPT-4.1 Mini?

Accepted Answer

On our 12-test benchmark suite, GPT-4.1 Mini wins 7 tests, Devstral Small 1.1 wins 1 (classification), and they tie on 4. GPT-4.1 Mini is the stronger general-purpose model by a wide margin. Devstral Small 1.1 is specifically better at classification, where it ties for 1st among 53 models in our testing compared to GPT-4.1 Mini's rank of 31st.

Question 2

Which is cheaper — Devstral Small 1.1 or GPT-4.1 Mini?

Accepted Answer

Devstral Small 1.1 is significantly cheaper: $0.10/M input and $0.30/M output tokens vs GPT-4.1 Mini's $0.40/M input and $1.60/M output. That's a price ratio of roughly 5x on output. At 100M output tokens/month, you'd pay ~$30 for Devstral Small 1.1 vs ~$160 for GPT-4.1 Mini — a $130/month difference at scale.

Question 3

Which model is better for coding and software engineering agents?

Accepted Answer

Devstral Small 1.1 is described in its payload description as a model purpose-built for software engineering agents, fine-tuned from Mistral Small 3.1 in collaboration with All Hands AI. However, our benchmark suite does not include a SWE-bench score for it, so we can't directly compare coding performance on that external measure. GPT-4.1 Mini's description notes 45.1% on hard coding tasks, and it scores higher than Devstral Small 1.1 on agentic planning (4 vs 2) in our testing, which is relevant for autonomous coding workflows.

Question 4

Which model handles long documents better?

Accepted Answer

GPT-4.1 Mini is clearly stronger for long-context tasks. It scores 5/5 on our long-context benchmark (retrieval accuracy at 30K+ tokens), tying for 1st among 55 models. Devstral Small 1.1 scores 4/5, ranking 38th of 55. GPT-4.1 Mini also has a much larger context window: 1,047,576 tokens vs Devstral Small 1.1's 131,072 tokens.

Question 5

Which model is better for building AI agents?

Accepted Answer

GPT-4.1 Mini has a substantial edge for agentic workloads. It scores 4/5 on agentic planning in our tests, ranking 16th of 54 models. Devstral Small 1.1 scores 2/5, ranking 53rd of 54 — second to last. Both models score 4/5 on tool calling and tie at rank 18 of 54, so function-calling mechanics are equivalent, but goal decomposition and failure recovery strongly favor GPT-4.1 Mini.

Question 6

Does GPT-4.1 Mini support images? Does Devstral Small 1.1?

Accepted Answer

Yes, per the payload data: GPT-4.1 Mini supports text, image, and file inputs (modality: text+image+file->text). Devstral Small 1.1 is text-only (modality: text->text). If your application requires processing images or file attachments, GPT-4.1 Mini is the only option of the two.

Devstral Small 1.1 vs GPT-4.1 Mini

Devstral Small 1.1

GPT-4.1 Mini

Benchmark Analysis

Pricing Analysis

Real-World Cost Comparison

Bottom Line

How We Test

Frequently Asked Questions