Question 1

Is Devstral Small 1.1 better than Grok 3 Mini?

Accepted Answer

No — on our 12-test suite Grok 3 Mini wins 8 categories and Devstral wins 0, with 4 ties. Grok outperforms Devstral on tool calling, faithfulness, long-context, persona consistency, agentic planning, strategic analysis, creative problem solving, and constrained rewriting.

Question 2

Which model is cheaper to run?

Accepted Answer

Devstral Small 1.1 is cheaper: $0.10 input / $0.30 output per mTok versus Grok 3 Mini at $0.30 input / $0.50 output per mTok. Under a 50/50 input/output token split that implies roughly $200 vs $400 per 1M total tokens.

Question 3

Which is better for tool-based agents and function calling?

Accepted Answer

Grok 3 Mini is better: tool_calling scores are 4 (Devstral) vs 5 (Grok), and Grok is tied for 1st in our tool_calling ranking. Expect more accurate function selection and argument sequencing with Grok.

Question 4

Which model handles long contexts and conversation persona better?

Accepted Answer

Grok 3 Mini wins both: long_context 5 vs 4 (Grok tied for 1st on long_context) and persona_consistency 5 vs 2 (Grok tied for 1st). Use Grok for 30K+ token retrieval tasks and consistent-character assistants.

Question 5

Are there tasks where Devstral ties or beats Grok?

Accepted Answer

Devstral ties Grok on structured_output (4 vs 4) and classification (4 vs 4). Devstral is tied for 1st on classification with 29 other models in our tests, making it a solid low-cost choice for categorization and schema-compliant outputs.

Question 6

Are either models open-weight or otherwise unrestricted?

Accepted Answer

Per the payload, neither Devstral Small 1.1 nor Grok 3 Mini are open-weight; both are proprietary (open_weight: false).

Devstral Small 1.1 vs Grok 3 Mini

Devstral Small 1.1

Grok 3 Mini

Benchmark Analysis

Pricing Analysis

Real-World Cost Comparison

Bottom Line

How We Test

Frequently Asked Questions