Question 1

Is Codestral 2508 better than Devstral Small 1.1?

Accepted Answer

In our testing Codestral 2508 wins 6 of 12 benchmarks (structured_output 5 vs 4, tool_calling 5 vs 4, faithfulness 5 vs 4, long_context 5 vs 4, agentic_planning 4 vs 2, persona_consistency 3 vs 2). Devstral wins classification (4 vs 3) and safety_calibration (2 vs 1).

Question 2

Which model is cheaper to run?

Accepted Answer

Devstral Small 1.1 is cheaper: input $0.10 / output $0.30 per 1,000,000 tokens versus Codestral 2508 at input $0.30 / output $0.90 — a 3× cost ratio in the payload.

Question 3

Which model is better for coding and agentic workflows?

Accepted Answer

Codestral 2508: it scores 5 on tool_calling (tied for 1st with 16 others) and 4 on agentic_planning (rank 16 of 54). The model description also notes specialization in fill-in-the-middle, code correction, and test generation.

Question 4

Which model is safer or more conservative about harmful requests?

Accepted Answer

Devstral Small 1.1 scores 2 on safety_calibration vs Codestral's 1 in our testing; Devstral ranks 12 of 55 (tied with 19) while Codestral ranks 32 of 55 (tied with 23), indicating better safety calibration for Devstral on our benchmark.

Question 5

How much more will Codestral cost at scale?

Accepted Answer

Assuming a 50/50 input/output token split, per 1M tokens Codestral costs $0.60 vs Devstral $0.20. That scales to $6.00 vs $2.00 at 10M tokens/month and $60.00 vs $20.00 at 100M tokens/month.

Question 6

Are there tasks where the models tie?

Accepted Answer

Yes — strategic_analysis (2/2), constrained_rewriting (3/3), creative_problem_solving (2/2), and multilingual (4/4) were ties in our 12-test suite.

Codestral 2508 vs Devstral Small 1.1

Codestral 2508

Devstral Small 1.1

Benchmark Analysis

Pricing Analysis

Real-World Cost Comparison

Bottom Line

How We Test

Frequently Asked Questions