Question 1

Is Devstral 2 2512 better than Devstral Medium?

Accepted Answer

In our testing Devstral 2 2512 wins 8 of 12 benchmarks (structured_output 5 vs 4, constrained_rewriting 5 vs 3, long_context 5 vs 4, tool_calling 4 vs 3, creative_problem_solving 4 vs 2, strategic_analysis 4 vs 2, persona_consistency 4 vs 3, multilingual 5 vs 4). Devstral Medium wins classification (4 vs 3). Three categories tied (faithfulness, safety_calibration, agentic_planning).

Question 2

Which model is cheaper to run?

Accepted Answer

They have identical pricing in the payload: input_cost_per_mtok = $0.40 and output_cost_per_mtok = $2.00. There is no cost advantage to choosing one over the other.

Question 3

Which model should I use for coding and tool-calling?

Accepted Answer

Use Devstral 2 2512 — it scored 4 on tool_calling vs Devstral Medium's 3 and ranks better on constrained_rewriting (5 vs 3) and long_context (5 vs 4), which helps in multi-step code generation and tool orchestration in our tests.

Question 4

Which model is better for classification and routing?

Accepted Answer

Devstral Medium is better for classification: score 4 vs Devstral 2 2512's 3, and Devstral Medium is tied for 1st on classification ("tied for 1st with 29 other models out of 53 tested") in our benchmarks.

Question 5

How much will costs look at scale?

Accepted Answer

Per the pricing fields: $400 per 1M input tokens and $2,000 per 1M output tokens. Assuming a 50/50 split of input/output, approximate totals are $1,200/month for 1M combined tokens, $12,000/month for 10M, and $120,000/month for 100M. Output tokens drive most spend.

Question 6

Do either model have a longer context window?

Accepted Answer

Yes. Devstral 2 2512 has a 262,144‑token window; Devstral Medium has a 131,072‑token window. That aligns with Devstral 2 2512 scoring 5 vs 4 on long_context in our tests.

Devstral 2 2512 vs Devstral Medium

Devstral 2 2512

Devstral Medium

Benchmark Analysis

Pricing Analysis

Real-World Cost Comparison

Bottom Line

How We Test

Frequently Asked Questions