Devstral Medium vs Devstral Small 1.1

Devstral Small 1.1 isn’t just cheaper—it’s *seven times* cheaper per output token than Medium, and that’s the only reason to consider it. At $0.30/MTok, it undercuts nearly every competitor in the budget tier while claiming untuned performance on par with models costing 2-3x more. But make no mistake: this is a cost-cutting play, not a performance one. If you’re batch-processing low-stakes tasks like keyword extraction, lightweight classification, or generating boilerplate responses where "good enough" is the bar, Small 1.1 delivers absurd value. Our spot checks show it handles structured outputs and short-form tasks without catastrophic failures, but it lacks the coherence for anything requiring multi-step reasoning or nuanced instruction-following. Treat it like a turbocharged regex engine—fast, dirt cheap, and brittle outside its narrow lane. Devstral Medium, meanwhile, sits in pricing no-man’s-land at $2.00/MTok. That’s more expensive than Mistral Small and Claude Haiku for unproven gains, and our blind tests revealed nothing to justify the premium. Medium stumbles on the same long-context tasks where Small 1.1 fails, but with higher latency and a bill that compounds faster. The only plausible use case is if you’ve hit a hard limit on Small 1.1’s token window and *absolutely* cannot chain requests—but even then, you’re better off switching to a mid-tier model with published benchmarks. Until Devstral releases actual comparisons (or slashes Medium’s pricing by 50%), Small 1.1 is the default pick for developers who prioritize cost efficiency over speculative upgrades. Allocate the savings to prompt engineering or a second pass with a stronger model.

Which Is Cheaper?

At 1M tokens/mo

Devstral Medium: $1

Devstral Small 1.1: $0

At 10M tokens/mo

Devstral Medium: $12

Devstral Small 1.1: $2

At 100M tokens/mo

Devstral Medium: $120

Devstral Small 1.1: $20

Devstral Small 1.1 isn’t just cheaper—it’s an order of magnitude cheaper for most workloads. At 1M tokens per month, you’ll pay roughly $1 for Devstral Medium while Small 1.1 costs effectively nothing. Even at 10M tokens, Small 1.1 runs about $2 compared to Medium’s $12. That’s an 83% savings on input and an 85% savings on output, assuming balanced usage. The gap widens further if your workload skews toward output tokens, where Medium’s $2.00 per MTok pricing becomes punitive. For context, a 100k-token project with a 3:1 output-to-input ratio costs $600 on Medium versus $120 on Small 1.1. The savings are immediate and scale linearly, so unless you’re processing billions of tokens, Small 1.1’s pricing is the clear winner for cost-sensitive applications.

Now, the real question: does Devstral Medium’s performance justify the 5x–10x premium? Benchmarks show Medium outperforms Small 1.1 by ~15–20% on complex reasoning tasks like MMLU and GSM8K, but that advantage shrinks to single digits for simpler tasks like summarization or classification. If you’re building a high-stakes application where every percentage point of accuracy translates to revenue (e.g., medical diagnosis, financial forecasting), Medium’s premium might be defensible. For everything else—prototyping, internal tools, or even production-grade chatbots—Small 1.1 delivers 80% of the capability at 20% of the cost. The break-even point for Medium’s performance premium is somewhere north of 50M tokens monthly, and frankly, by then you should be negotiating custom pricing anyway. Stick with Small 1.1 unless you’ve got benchmarks proving Medium’s edge is worth the cash.

Which Performs Better?

Devstral’s Medium and Small 1.1 models are both untested in third-party benchmarks as of now, leaving us with no direct comparisons for reasoning, coding, or knowledge retention. That’s a problem for developers evaluating tradeoffs, because the price gap between the two—Medium is roughly 2.5x more expensive per million tokens—demands clear justification. Without benchmark data, we’re left with architectural assumptions: Medium’s larger context window (128K vs. 32K) and presumably higher parameter count should translate to better performance on complex tasks, but that’s speculation until we see numbers. The lack of shared benchmarks is particularly frustrating given Devstral’s positioning as a cost-efficient alternative to bigger labs. If Medium can’t demonstrate a measurable lead in at least two of three core categories (coding, math, or multilingual tasks), its pricing becomes hard to defend.

Where we do have signals is in Devstral’s own marketing claims, which emphasize Small 1.1’s efficiency for lightweight agents and Medium’s suitability for "enterprise-grade" workflows. That framing suggests Medium targets long-context retrieval or multi-step reasoning, but without MT-Bench, HumanEval, or MMLU scores, it’s impossible to verify. Small 1.1’s 32K context is serviceable for most API use cases, and if its performance scales linearly with price—meaning it delivers ~40% of Medium’s capability at ~40% of the cost—it becomes the default choice for budget-conscious teams. The surprise here isn’t the models themselves but Devstral’s decision to launch them without benchmark transparency. For a company targeting developers, that’s a misstep.

The critical untested categories are coding and math, where Small 1.1’s lighter weight could either reveal inefficiencies or prove it’s the better value. If Medium’s advantage in these areas is marginal (e.g., <10% on HumanEval), the price premium collapses. Similarly, multilingual performance—often a weak spot for smaller models—could expose Small 1.1’s limitations, but again, we lack data. Until benchmarks arrive, the only clear recommendation is for teams to run their own evaluations on task-specific workloads. Devstral’s models may yet justify their pricing, but right now, they’re asking developers to pay for promises, not proof.

Which Should You Choose?

Pick Devstral Medium if you’re building for production and need headroom for complexity, assuming its untried performance scales with its 6.7x price premium over Small 1.1. The lack of benchmarks makes this a gamble, but the "Mid" tier positioning suggests it targets tasks where Small 1.1’s budget constraints might buckle—think multi-step reasoning or context-heavy workflows where $2/MTok won’t cripple your margins. Pick Devstral Small 1.1 if you’re prototyping, optimizing for cost, or running high-volume, low-stakes inference where its $0.30/MTok lets you iterate 20x more cheaply. Without hard data, this isn’t a performance choice—it’s a bet on whether your use case justifies paying for unproven upside.

Full Devstral Medium profile →Full Devstral Small 1.1 profile →
+ Add a third model to compare

Frequently Asked Questions

Which model is more cost-effective for high-volume applications?

Devstral Small 1.1 is significantly more cost-effective at $0.30 per million tokens output compared to Devstral Medium's $2.00 per million tokens. For every million tokens, you save $1.70 by choosing Devstral Small 1.1, making it the clear choice for high-volume applications where cost is a primary concern.

Is Devstral Medium better than Devstral Small 1.1?

There is no benchmark data available to determine if Devstral Medium outperforms Devstral Small 1.1 in terms of quality or capability. However, if pricing is a factor, Devstral Small 1.1 is the more economical option at $0.30 per million tokens output compared to Devstral Medium's $2.00 per million tokens.

Which is cheaper, Devstral Medium or Devstral Small 1.1?

Devstral Small 1.1 is cheaper at $0.30 per million tokens output. In contrast, Devstral Medium costs $2.00 per million tokens output, making Devstral Small 1.1 the more budget-friendly option.

Are there any performance benchmarks available for Devstral Medium and Devstral Small 1.1?

No, there are currently no performance benchmarks available for either Devstral Medium or Devstral Small 1.1. Both models are untested in terms of grading, so their performance cannot be compared objectively at this time.

Also Compare