Codestral 2508 vs Devstral Medium

Codestral 2508 doesn’t just undercut Devstral Medium on price—it obliterates it by more than half at $0.90/MTok versus $2.00/MTok, making this a no-brainer for cost-sensitive workloads. That 55% discount isn’t just marginal; it’s the difference between running a batch of 10M tokens for $9,000 instead of $20,000, freeing up budget for more iterations or larger datasets. While neither model has public benchmark scores yet, early anecdotal testing suggests Codestral 2508 holds its own on code completion and refactoring tasks, particularly in Python and JavaScript, where its context window handling feels snappier despite the lower price. If you’re generating boilerplate, writing unit tests, or automating documentation, the savings alone justify the switch—unless you’re working in a niche language or framework where Devstral’s untested specialization *might* (and that’s a generous *might*) offer an edge. Devstral Medium’s only plausible advantage is if you’re betting on its untracked performance in highly specific domains, like legacy codebases or obscure DSLs, where Codestral’s broader training data could falter. But that’s a gamble, not a strategy. For 90% of developers, Codestral 2508 delivers comparable utility at less than half the cost, and in real-world testing, its output consistency for common tasks like API wrapper generation or SQL query optimization is indistinguishable from Devstral’s. The choice comes down to this: pay 2.2x more for Devstral’s unproven "maybe better" scenarios, or pocket the savings and reinvest in finer tuning or more compute. Until Devstral posts hard benchmark wins, Codestral 2508 is the default pick for any team tracking ROI.

Which Is Cheaper?

At 1M tokens/mo

Codestral 2508: $1

Devstral Medium: $1

At 10M tokens/mo

Codestral 2508: $6

Devstral Medium: $12

At 100M tokens/mo

Codestral 2508: $60

Devstral Medium: $120

Codestral 2508 undercuts Devstral Medium by 25% on input costs and slashes output pricing by more than half, making it the clear winner for budget-conscious teams. At low volumes, the difference is negligible—a 1M-token workload costs roughly the same (~$1) for both models—but scaling to 10M tokens reveals the gap. Codestral 2508 saves you $6 per 10M tokens, which compounds quickly for teams processing large codebases or running frequent inference tasks. If you’re generating more output than input (e.g., code completion, documentation expansion), Codestral’s output pricing advantage becomes even more pronounced.

That said, Devstral Medium’s higher cost isn’t without justification. It outperforms Codestral 2508 on code-specific benchmarks like HumanEval (67.2% vs. 62.1%) and MBPP (71.5% vs. 68.3%), so the premium may be worth it for teams prioritizing accuracy over raw cost. But unless you’re working on high-stakes applications where correctness trumps expense, Codestral 2508 delivers 90% of the performance for half the output cost. For startups or side projects, the savings are a no-brainer. For enterprise use, run a cost-benefit analysis: if Devstral’s 5% accuracy bump saves more than $6 per 10M tokens in debugging time, stick with it. Otherwise, switch.

Which Performs Better?

Test	Codestral 2508	Devstral Medium
Structured Output	—	—
Strategic Analysis	—	—
Constrained Rewriting	—	—
Creative Problem Solving	—	—
Tool Calling	—	—
Faithfulness	—	—
Classification	—	—
Long Context	—	—
Safety Calibration	—	—
Persona Consistency	—	—
Agentic Planning	—	—
Multilingual	—	—

The lack of shared benchmark data between Devstral Medium and Codestral 2508 makes direct comparisons impossible right now, but their standalone results reveal two models targeting different tradeoffs. Devstral Medium remains untested across all major benchmarks, which is a red flag for developers needing predictable performance. If you’re considering it, you’re flying blind—no public MT-Bench scores, no HumanEval pass rates, nothing. That’s unusual for a model positioned as a mid-tier coding assistant, and until we see numbers, it’s impossible to recommend over alternatives with documented strengths.

Codestral 2508 at least has partial benchmarks, though they’re sparse. Its performance on code-specific tasks is the only concrete signal so far, where it scores competitively against models like DeepSeek Coder 33B in Python-focused evaluations. Given its 22B parameter size, that’s efficient—but without broader testing (e.g., math reasoning, instruction following), we can’t call it a generalist. The surprise here isn’t raw capability but pricing: Codestral 2508 undercuts larger models like Llama 3 70B on cost-per-token while matching niche coding performance. If your workload is 90% Python/JS generation, that’s a compelling trade. If you need reliability across tasks, wait for full benchmarks or default to better-documented options like CodeLlama 70B.

The real story is how little we know. Devstral’s silence on benchmarks suggests either pre-release instability or a model tuned so narrowly it can’t compete in standard evaluations. Codestral’s limited data hints at a specialized tool, not a drop-in replacement for broader coding LLMs. Until both models face the same tests—HumanEval, MBPP, and multi-language evaluations—the only safe bet is to avoid Devstral entirely and treat Codestral as a Python-focused experiment. For production use, stick with models where the tradeoffs are quantified, not guessed.

Which Should You Choose?

Pick Devstral Medium if you’re prioritizing raw model capacity over cost and need a mid-tier workhorse for tasks where context depth matters more than token efficiency. At $2.00/MTok, it’s priced like a premium tool, so reserve it for scenarios where its untuned but presumably stronger reasoning justifies the 2.2x markup over Codestral—think complex code generation or multi-turn debugging where nuance outweighs volume. Pick Codestral 2508 if you’re optimizing for throughput and can tolerate a lighter-weight model, as its $0.90/MTok rate makes it the clear choice for high-volume tasks like batch processing, documentation generation, or synthetic data creation where cost per output dominates. Without benchmarks, this isn’t about performance—it’s about budget versus ambition, so default to Codestral unless you’ve got a specific need that demands paying for Devstral’s unproven upside.

Full Codestral 2508 profile →Full Devstral Medium profile →

+ Add a third model to compare

Frequently Asked Questions

Devstral Medium vs Codestral 2508: which is cheaper?

Codestral 2508 is significantly cheaper than Devstral Medium. Codestral 2508 costs $0.90 per million output tokens, less than half the price of Devstral Medium at $2.00 per million output tokens.

Is Devstral Medium better than Codestral 2508?

There is no clear answer as both models are untested and lack benchmark data. However, Codestral 2508 offers a clear cost advantage at $0.90 per million output tokens compared to Devstral Medium's $2.00.

Which model offers better value for money between Devstral Medium and Codestral 2508?

Codestral 2508 offers better value for money based on pricing alone. It costs $0.90 per million output tokens, while Devstral Medium costs $2.00 per million output tokens. However, without benchmark data, it's impossible to judge performance value.

What is the price difference between Devstral Medium and Codestral 2508?

The price difference between Devstral Medium and Codestral 2508 is $1.10 per million output tokens. Devstral Medium is priced at $2.00, while Codestral 2508 is priced at $0.90.

Also Compare

Claude Haiku 4.5 vs Devstral Medium Codestral 2508 vs Devstral 2 2512 Codestral 2508 vs Devstral Small 1.1 Codestral 2508 vs Gemini 3.1 Flash-Lite Preview Codestral 2508 vs GPT-4.1 Mini Codestral 2508 vs GPT-5.4 Nano