Question 1

Is Devstral Small 1.1 better than Mistral Small 3.1 24B?

Accepted Answer

It depends on the task. In our 12-test suite Devstral wins classification, tool_calling, and safety_calibration (scores 4/4/2 vs 3/1/1). Mistral wins long_context, strategic_analysis, and agentic_planning (scores 5/3/3 vs 4/2/2). Many tests tie. Pick by which wins matter to your use case.

Question 2

Which model is cheaper to run?

Accepted Answer

Devstral is materially cheaper. Assuming a 50/50 input/output token split, Devstral costs $200 per 1M tokens, $2,000 per 10M, $20,000 per 100M. Mistral costs $455 per 1M, $4,550 per 10M, $45,500 per 100M—Devstral saves $255 per 1M under that assumption.

Question 3

Which is better for coding or software engineering agents?

Accepted Answer

Devstral is tuned for software engineering agents per its description and wins tool_calling (4 vs 1) and classification (4 vs 3) in our tests, making it the stronger choice for agentic coding workflows that require function selection and argument accuracy.

Question 4

Which model handles very long documents better?

Accepted Answer

Mistral Small 3.1 24B wins long_context (5 vs 4) and is tied for 1st of 55 models on that metric in our testing, so it handles retrieval and reasoning over 30K+ tokens more reliably in our benchmarks.

Question 5

Can Mistral Small 3.1 24B call tools?

Accepted Answer

No — the payload marks Mistral with a 'no_tool_calling' quirk and it scores 1 on tool_calling in our tests. Devstral exposes tool parameters and scores 4 on tool_calling.

Question 6

Are there any tied areas where both models perform the same?

Accepted Answer

Yes. Both models tie on structured_output (4/4), constrained_rewriting (3/3), creative_problem_solving (2/2), faithfulness (4/4), persona_consistency (2/2), and multilingual (4/4).

Devstral Small 1.1 vs Mistral Small 3.1 24B

Devstral Small 1.1

Mistral Small 3.1 24B

Benchmark Analysis

Pricing Analysis

Real-World Cost Comparison

Bottom Line

How We Test

Frequently Asked Questions