Question 1

Is Devstral Small 1.1 better than o4 Mini?

Accepted Answer

Not on most of our benchmarks. o4 Mini wins 9 of 12 tests in our internal suite, ties 2, and loses only on safety calibration (Devstral scores 2 vs o4 Mini's 1). Devstral Small 1.1's scores on agentic planning (2, ranking 53rd of 54 models) and strategic analysis (2, ranking 44th of 54) are well below the field average. The case for Devstral Small 1.1 is its price — $0.30/M output vs $4.40/M — and its stated focus on software engineering agent tasks.

Question 2

Which is cheaper — Devstral Small 1.1 or o4 Mini?

Accepted Answer

Devstral Small 1.1 is substantially cheaper: $0.10/M input and $0.30/M output, versus o4 Mini's $1.10/M input and $4.40/M output. That's 11x cheaper on input and 14.7x cheaper on output. At 10M output tokens/month the difference is $41; at 100M tokens/month it's $4,100. Note that o4 Mini uses reasoning tokens and requires a minimum of 1,000 max completion tokens, which can push real-world costs above the listed rate.

Question 3

Which model is better for coding?

Accepted Answer

Devstral Small 1.1 was built specifically for software engineering agents (developed with All Hands AI, finetuned from Mistral Small 3.1), which is its primary design target. However, o4 Mini scores higher on tool calling (5 vs 4) and agentic planning (4 vs 2) in our testing — capabilities that directly underpin agentic coding workflows. o4 Mini also scores 97.8% on MATH Level 5 and 81.7% on AIME 2025 (Epoch AI), indicating strong algorithmic reasoning. For cost-sensitive, high-volume code generation Devstral Small 1.1 is compelling; for complex agentic coding requiring reliable planning and tool use, o4 Mini's benchmark profile is stronger.

Question 4

Does o4 Mini support image inputs?

Accepted Answer

Yes. o4 Mini's modality is listed as text+image+file→text in our data, meaning it can accept images and files alongside text. Devstral Small 1.1 is text→text only — it does not accept image or file inputs. If your workflow involves analyzing screenshots, diagrams, or documents, o4 Mini is the only option of the two.

Question 5

Which model has better agentic capabilities?

Accepted Answer

o4 Mini scores significantly higher on agentic planning in our testing: 4 vs Devstral Small 1.1's 2. Devstral ranks 53rd of 54 models on this benchmark — near the bottom of the field. o4 Mini also scores 5 on tool calling (tied for 1st among 54 models) versus Devstral's 4 (ranked 18th). For autonomous, multi-step agent workflows requiring goal decomposition and failure recovery, o4 Mini is the better-supported choice by the data.

Question 6

What is the context window for each model?

Accepted Answer

o4 Mini supports a 200,000-token context window with up to 100,000 max output tokens. Devstral Small 1.1 supports a 131,072-token context window and has no listed max output token limit in our data. Both are large enough for most document and codebase tasks, but o4 Mini's 200K window provides meaningful headroom for very long codebases or extended conversations.

Devstral Small 1.1 vs o4 Mini

Devstral Small 1.1

o4 Mini

Benchmark Analysis

Pricing Analysis

Real-World Cost Comparison

Bottom Line

How We Test

Frequently Asked Questions