Question 1

Is Devstral Small 1.1 better than Grok 3?

Accepted Answer

In our testing Grok 3 wins 8 of 12 benchmarks (long-context, faithfulness, strategic analysis, agentic planning, persona consistency, structured output, multilingual, creative problem solving). Devstral Small 1.1 does not win any categories but ties Grok on tool calling, classification, constrained rewriting, and safety calibration.

Question 2

Which model is cheaper?

Accepted Answer

Devstral Small 1.1 is far cheaper: $0.10 per mTok input and $0.30 per mTok output vs Grok 3 at $3.00 input and $15.00 output (per the payload).

Question 3

Which is better for coding and data extraction?

Accepted Answer

Grok 3 scores 5 on structured output and ranks tied for 1st for structured_output in our tests, while Devstral scores 4 and ranks 26/54. For strict schema adherence and enterprise data extraction Grok 3 showed superior performance in our benchmarks; for basic tooling integrations Devstral remains competitive (tool_calling is 4/4, a tie).

Question 4

How do costs translate to monthly spend?

Accepted Answer

Assuming a 50/50 input/output token split: 1M combined tokens ≈ Devstral $200 vs Grok $9,000; 10M → Devstral $2,000 vs Grok $90,000; 100M → Devstral $20,000 vs Grok $900,000 (calculations use per-mTok prices from the payload).

Question 5

Are their context windows different?

Accepted Answer

No. Both Devstral Small 1.1 and Grok 3 have a 131,072 token context window in the payload.

Question 6

Which model is better at multilingual tasks?

Accepted Answer

Grok 3 scored 5 on multilingual and is tied for 1st in our ranking; Devstral scored 4 and ranks 36/55. In our testing Grok 3 provides stronger parity across non-English languages.

Devstral Small 1.1 vs Grok 3

Devstral Small 1.1

Grok 3

Benchmark Analysis

Pricing Analysis

Real-World Cost Comparison

Bottom Line

How We Test

Frequently Asked Questions