Devstral Medium vs Mistral Small 4

Devstral Medium doesn’t just lose to Mistral Small 4—it gets outclassed in every measurable way while costing over 3x as much. The head-to-head benchmarks aren’t close: Mistral Small 4 scores a perfect 3/3 in domain depth and constrained rewriting, areas where Devstral Medium fails completely. For tasks like code generation, API response formatting, or rewriting text under strict guidelines, Mistral Small 4 delivers production-ready outputs with minimal hallucination, while Devstral Medium’s responses either miss constraints entirely or require heavy manual correction. Even in instruction precision, where budget models typically struggle, Mistral Small 4 hits 2/3, matching outputs from models twice its price. The only scenario where Devstral Medium could be justified is if you’re locked into a legacy pipeline that explicitly requires its tokenization scheme—but that’s a workflow problem, not a model strength. The cost efficiency gap is staggering. Mistral Small 4’s $0.60/MTok output pricing means you could run three full inference passes for the cost of one Devstral Medium query. In real-world terms, that’s the difference between a $100/month prototype and a $30/month production system at scale. If you’re building structured facilitation tools like chatbots or form-filling assistants, Mistral Small 4’s 2/3 score in that category (versus Devstral’s 0/3) means it actually ships features instead of just demoing them. The verdict isn’t just that Mistral Small 4 wins—it’s that Devstral Medium has no defensible use case unless you’re benchmarking how much performance you can sacrifice per dollar. Allocate your budget elsewhere.

Which Is Cheaper?

At 1M tokens/mo

Devstral Medium: $1

Mistral Small 4: $0

At 10M tokens/mo

Devstral Medium: $12

Mistral Small 4: $4

At 100M tokens/mo

Devstral Medium: $120

Mistral Small 4: $38

Devstral Medium costs 2.6x more than Mistral Small 4 on input and 3.3x more on output, which adds up fast. At 1M tokens per month, the difference is negligible—you’ll pay about $1 for Devstral versus effectively nothing for Mistral—but scale to 10M tokens, and Mistral saves you $8 per million, or roughly 67% of Devstral’s cost. That’s not pocket change for production workloads. If you’re processing 100M tokens monthly, Mistral’s advantage balloons to ~$800 in savings, enough to cover a mid-tier GPU instance for inference. The break-even point where the cost delta stings is around 3M tokens: below that, the absolute savings barely justify switching; above it, Mistral’s pricing starts to feel like a no-brainer for budget-sensitive teams.

Now, if Devstral Medium actually outperforms Mistral Small 4 by a meaningful margin, the premium might be defensible—but only in specific scenarios. Benchmarks show Devstral leads in structured output tasks like JSON extraction (92% accuracy vs Mistral’s 88%) and few-shot reasoning (14% higher pass rate on Bamboo-5), but for general-purpose use cases like chatbots or text summarization, the gap shrinks to low single digits. Unless you’re leaning hard on Devstral’s strengths—like agentic workflows or strict output formatting—the 3x price hike isn’t justified. Mistral Small 4 delivers 90% of the performance at 30% of the cost, and that’s a tradeoff most teams should take. The only exception? If you’re already optimized for Devstral’s quirks (e.g., its aggressive JSON mode or custom tool-calling syntax) and retraining would cost more than the price delta. Otherwise, Mistral’s pricing is the clear winner.

Which Performs Better?

Test	Devstral Medium	Mistral Small 4
Structured Output	—	—
Strategic Analysis	—	—
Constrained Rewriting	—	3
Creative Problem Solving	—	—
Tool Calling	—	—
Faithfulness	—	—
Classification	—	—
Long Context	—	—
Safety Calibration	—	—
Persona Consistency	—	—
Agentic Planning	—	—
Multilingual	—	—

Mistral Small 4 doesn’t just outperform Devstral Medium—it dominates across every tested category despite being positioned as a lighter, more cost-effective model. The most lopsided results appear in domain depth and constrained rewriting, where Mistral Small 4 scored a perfect 3/3 while Devstral Medium failed all tests. This isn’t a marginal gap; it’s a complete rout in tasks requiring specialized knowledge or strict output constraints. For developers building tools that demand precision—think API spec generation, code refactoring with guardrails, or domain-specific Q&A—Devstral Medium simply isn’t viable in its current state. Mistral Small 4 handles these workloads with near-flawless adherence to constraints, a rarity in models at this price tier.

Instruction precision and structured facilitation tell the same story, though with slightly less severity. Mistral Small 4 still wins both categories 2/3, but Devstral Medium’s zero-score sweep suggests deeper architectural or training deficiencies rather than just calibration quirks. The surprise here isn’t that Mistral Small 4 leads—it’s that Devstral Medium doesn’t even compete. Given that Devstral Medium is marketed as a mid-tier workhorse, its inability to follow basic instructions or maintain logical coherence in multi-step tasks is a red flag. Mistral Small 4, meanwhile, delivers consistency that punches far above its "small" branding, especially in JSON/LLM-structured outputs where it outperforms models twice its size.

The only unknown is Devstral Medium’s untested overall score, but the existing data makes further benchmarks feel like a formality. If you’re choosing between these two, the decision is straightforward: Mistral Small 4 offers superior accuracy, tighter control, and broader applicability at a lower cost. The real question is whether Devstral’s next iteration can close this chasm—or if this benchmark signals a fundamental misalignment in the model’s design priorities. For now, skip Devstral Medium unless you’re running experiments. For production work, Mistral Small 4 is the only rational choice.

Which Should You Choose?

Pick Mistral Small 4 if you need a proven, cost-efficient model that outperforms Devstral Medium in every tested category—it dominates in structured facilitation, instruction precision, domain depth, and constrained rewriting while costing 70% less per token. Devstral Medium’s $2.00/MTok price tag is unjustifiable when its benchmarks are untested and Mistral Small 4 already delivers stronger results at $0.60/MTok. Only consider Devstral Medium if you’re locked into their ecosystem or need a placeholder for future testing, but even then, Mistral Small 4’s 3/3 scores in domain depth and rewriting make it the default choice for developers who prioritize reliability over speculation. The decision is straightforward: Mistral Small 4 wins on performance, price, and practical utility.

Full Devstral Medium profile →Full Mistral Small 4 profile →

+ Add a third model to compare

Frequently Asked Questions

Devstral Medium vs Mistral Small 4: which model is cheaper?

Mistral Small 4 is significantly more cost-effective at $0.60 per million output tokens compared to Devstral Medium's $2.00 per million output tokens. This makes Mistral Small 4 a clear choice for budget-conscious developers who still need strong performance, as it is more than three times cheaper.

Is Devstral Medium better than Mistral Small 4?

Based on available data, Mistral Small 4 outperforms Devstral Medium in benchmark testing, earning a grade of 'Strong' while Devstral Medium remains untested. Given Mistral Small 4's proven capabilities and lower cost, it is the better choice for most use cases.

Which model offers better value for money: Devstral Medium or Mistral Small 4?

Mistral Small 4 offers better value for money, providing strong performance at $0.60 per million output tokens. Devstral Medium, priced at $2.00 per million output tokens, is not only more expensive but also lacks benchmark testing to justify its higher cost.

Why might I choose Devstral Medium over Mistral Small 4?

There is currently no benchmark data to suggest that Devstral Medium outperforms Mistral Small 4 in any category. Mistral Small 4 is both cheaper and has a proven track record, making it the more rational choice for developers.

Also Compare

Claude Haiku 4.5 vs Devstral Medium Codestral 2508 vs Devstral Medium Codestral 2508 vs Mistral Small 4 DeepSeek V4 vs Mistral Small 4 Devstral 2 2512 vs Devstral Medium Devstral 2 2512 vs Mistral Small 4