Question 1

Is Devstral Small 1.1 better than Mistral Small 4?

Accepted Answer

On the majority of our benchmarks, no. Mistral Small 4 wins 6 of 12 tests in our testing and Devstral Small 1.1 wins 1 (classification). Devstral Small 1.1 ranks nearly last on agentic planning (53rd of 54) and persona consistency (51st of 53), while Mistral Small 4 ranks near the top on both. However, for classification and routing tasks specifically, Devstral Small 1.1 is the better choice — it ties for 1st among 53 models (score: 4) versus Mistral Small 4's near-last ranking of 51st of 53 (score: 2).

Question 2

Which model is cheaper — Devstral Small 1.1 or Mistral Small 4?

Accepted Answer

Devstral Small 1.1 is cheaper: $0.10/M input and $0.30/M output versus Mistral Small 4's $0.15/M input and $0.60/M output. That's 50% cheaper on output tokens. At 10M output tokens/month, Devstral saves you roughly $3,000/year. At 100M tokens/month, the saving reaches $30,000/year. For most developers running under 1M tokens/month, the absolute dollar difference is under $30/month — not a strong reason to choose the weaker model.

Question 3

Which model is better for coding and software engineering?

Accepted Answer

Devstral Small 1.1 was developed specifically for software engineering agents (fine-tuned from Mistral Small 3.1 in collaboration with All Hands AI, per its description), but our benchmark suite does not include an external coding benchmark score for either model. Neither model has a SWE-bench Verified score in our dataset. On the benchmarks we do have, Devstral Small 1.1 scores 4/5 on tool calling and structured output — useful for code-related workflows — but its agentic planning score of 2/5 (ranking 53rd of 54) is a concern for autonomous coding agents. We'd recommend waiting for external benchmark data before treating Devstral as the definitive coding choice at this model tier.

Question 4

Does Mistral Small 4 support image inputs?

Accepted Answer

Yes. According to our data, Mistral Small 4 has a text+image->text modality, meaning it can process image inputs alongside text. Devstral Small 1.1 is text-only (text->text). If your application involves image understanding or multimodal inputs, Mistral Small 4 is your only option between these two.

Question 5

Which model has a larger context window?

Accepted Answer

Mistral Small 4 has a 262,144-token context window — twice the size of Devstral Small 1.1's 131,072 tokens. Both are large by current standards, but if you're processing very long documents or maintaining extended conversation histories, Mistral Small 4's larger window is a concrete advantage. Both models score 4/5 on our long-context benchmark, tied at rank 38 of 55.

Question 6

Can I use Devstral Small 1.1 for agentic or autonomous workflows?

Accepted Answer

We'd advise against it based on our testing. Devstral Small 1.1 scores 2/5 on agentic planning — ranking 53rd of 54 models, nearly last in the field. That benchmark covers goal decomposition and failure recovery, which are core to autonomous agent behavior. Mistral Small 4 scores 4/5 on the same test (rank 16 of 54). For agentic use cases, Mistral Small 4 is the significantly stronger choice between these two models.

Devstral Small 1.1 vs Mistral Small 4

Devstral Small 1.1

Mistral Small 4

Benchmark Analysis

Pricing Analysis

Real-World Cost Comparison

Bottom Line

How We Test

Frequently Asked Questions