Question 1

Is Devstral Medium better than Devstral Small 1.1?

Accepted Answer

It depends on the task. In our testing Devstral Medium wins 2 of 12 benchmarks (agentic_planning 4 vs 2 and persona_consistency 3 vs 2). Devstral Small 1.1 also wins 2 (tool_calling 4 vs 3 and safety_calibration 2 vs 1). They tie on eight other tests, so choose based on whether agentic planning/persona fidelity or cost/tool-calling/safety matters more.

Question 2

Which model is cheaper and by how much?

Accepted Answer

Devstral Small 1.1 is materially cheaper: input $0.1 vs $0.4 per 1k and output $0.3 vs $2.0 per 1k tokens in the payload. The output price is 6.67× cheaper (2.0 / 0.3), and with a 50/50 token split that equals roughly $200/month (1M tokens) for Small vs $1,200/month for Medium.

Question 3

Which is better for tool calling and function selection?

Accepted Answer

Devstral Small 1.1: it scored 4 vs Devstral Medium's 3 on tool_calling in our tests, and Small ranks 18 of 54 (tied with 28 models) compared with Medium at rank 47 of 54 — indicating Small 1.1 is the stronger choice for function selection and argument accuracy in our benchmarks.

Question 4

Which is better for agentic workflows and planning?

Accepted Answer

Devstral Medium: it scored 4 vs Devstral Small 1.1's 2 on agentic_planning, and ranks 16 of 54 in our tests (Small ranks 53 of 54). In our benchmarking Medium handles goal decomposition and failure recovery more robustly.

Question 5

Do they differ on long context or structured outputs?

Accepted Answer

No meaningful difference in our testing: both models score 4 on long_context and 4 on structured_output. Both have the same 131,072 token context window in the payload and matched ranks (structured_output rank 26/54; long_context rank 38/55), so expect similar behavior on long documents and JSON/schema adherence.

Question 6

How should I decide if I care about cost at scale?

Accepted Answer

If you process high volumes (10M–100M tokens/month), the cheaper output cost of Devstral Small 1.1 compounds: using a 50/50 split, 10M tokens ≈ $2,000 for Small vs $12,000 for Medium; 100M ≈ $20,000 vs $120,000. Teams with large user bases or batch-processing pipelines should prioritize Small 1.1 unless Medium's agentic advantages justify the additional expense.

Devstral Medium vs Devstral Small 1.1

Devstral Medium

Devstral Small 1.1

Benchmark Analysis

Pricing Analysis

Real-World Cost Comparison

Bottom Line

How We Test

Frequently Asked Questions