Devstral Medium vs Grok Code Fast 1
Grok Code Fast 1 wins this matchup outright — it scores higher than Devstral Medium on 6 of 12 benchmarks in our testing, ties on the remaining 6, and wins on 0. It also costs less: $0.20 input / $1.50 output per MTok versus Devstral Medium's $0.40 / $2.00. Devstral Medium holds its own only in parity — it never pulls ahead — so Grok Code Fast 1 is the stronger choice for most coding and agentic workloads at a lower price.
mistral
Devstral Medium
Benchmark Scores
External Benchmarks
Pricing
Input
$0.400/MTok
Output
$2.00/MTok
modelpicker.net
xai
Grok Code Fast 1
Benchmark Scores
External Benchmarks
Pricing
Input
$0.200/MTok
Output
$1.50/MTok
modelpicker.net
Benchmark Analysis
Across our 12-test benchmark suite, Grok Code Fast 1 wins 6 tests outright and ties the remaining 6. Devstral Medium wins none.
Where Grok Code Fast 1 wins clearly:
- Agentic planning (5 vs 4): Grok Code Fast 1 scores 5/5, tied for 1st among 54 models in our testing. Devstral Medium scores 4/5, ranked 16th of 54. For multi-step task execution and autonomous coding agents, this gap matters — 5 represents the top tier while 4 is solid mid-pack.
- Tool calling (4 vs 3): Grok Code Fast 1 scores 4/5 (rank 18 of 54), Devstral Medium scores 3/5 (rank 47 of 54). A score of 3 places Devstral Medium near the bottom of the field on function selection and argument accuracy — a meaningful gap for API-integrated or tool-augmented workflows.
- Persona consistency (4 vs 3): Grok Code Fast 1 ranks 38 of 53; Devstral Medium ranks 45 of 53. Both are below median, but Devstral Medium's 3/5 is notably weaker for chatbot or roleplay applications.
- Strategic analysis (3 vs 2): Both are below the median (p50 = 4), but Devstral Medium's 2/5 puts it at rank 44 of 54 — near the bottom. Grok Code Fast 1 scores 3/5 at rank 36. Neither excels at nuanced tradeoff reasoning, but Devstral Medium struggles more.
- Creative problem solving (3 vs 2): Same pattern — Grok Code Fast 1 scores 3/5 (rank 30 of 54), Devstral Medium scores 2/5 (rank 47 of 54). Generating non-obvious, feasible ideas is a weak point for Devstral Medium.
- Safety calibration (2 vs 1): Grok Code Fast 1 scores 2/5 (rank 12 of 55), Devstral Medium scores 1/5 (rank 32 of 55). Neither is strong here — the p75 is only 2, meaning most models score low — but Devstral Medium's 1/5 is the floor of our scale.
Where they tie:
- Structured output (4/4): Both score 4/5, tied at rank 26 of 54. Solid JSON schema compliance from both.
- Faithfulness (4/4): Both score 4/5 at rank 34 of 55. Neither hallucinates frequently in our tests.
- Classification (4/4): Both tied for 1st among 53 models — the most crowded top tier in our suite. Strong routing accuracy from both.
- Long context (4/4): Both score 4/5 at rank 38 of 55. Adequate retrieval at 30K+ tokens, though not top-tier.
- Constrained rewriting (3/3): Both rank 31 of 53. Mid-pack compression performance.
- Multilingual (4/4): Both rank 36 of 55. Consistent non-English quality from both.
The pattern is clear: where Devstral Medium diverges from Grok Code Fast 1, it diverges downward. Its weakest results — tool calling at rank 47, creative problem solving at rank 47, safety calibration at rank 32 with a 1/5 score — are liabilities for production deployments. Grok Code Fast 1's standout is agentic planning at 5/5, tied for 1st, which aligns directly with its described strength as a coding agent model. Note that neither model has been tested on our suite's external benchmarks (SWE-bench Verified, AIME 2025, MATH Level 5) as of this report.
Pricing Analysis
Grok Code Fast 1 is cheaper on both dimensions: $0.20/MTok input and $1.50/MTok output versus Devstral Medium's $0.40/MTok input and $2.00/MTok output. That's half the input cost and 25% less on output. At 1M output tokens/month, you pay $1.50 vs $2.00 — a $0.50 gap. At 10M tokens it's $15 vs $20, and at 100M tokens the gap widens to $150 vs $200 per month on output alone. Input costs double that advantage: 100M input tokens costs $20 with Grok Code Fast 1 vs $40 with Devstral Medium. For high-volume agentic pipelines or code generation tools running millions of tokens monthly, Grok Code Fast 1 is the clear cost winner. The only reason to pay Devstral Medium's premium would be a specific workflow where its parity scores are a hard requirement — and the data shows no such edge exists.
Real-World Cost Comparison
Bottom Line
Choose Grok Code Fast 1 if: you're building agentic coding workflows, need reliable tool calling (4/5 at rank 18 vs 3/5 at rank 47), or are running high-volume pipelines where the lower cost ($0.20/$1.50 vs $0.40/$2.00 per MTok) compounds into real savings. Its 5/5 agentic planning score — tied for 1st among 54 models in our testing — makes it the better pick for autonomous agents that decompose goals and recover from failures. At 100M output tokens/month, you save $50 on output and $20 on input vs Devstral Medium.
Choose Devstral Medium if: you have a specific integration requirement tied to the Mistral ecosystem, need the supported parameters it offers (frequency_penalty, presence_penalty, seed), or your workload is dominated by tasks where both models tie — classification, faithfulness, structured output, or long context. Be aware you're paying more for no benchmark advantage. Devstral Medium's 1/5 safety calibration score is also worth flagging if your deployment has content moderation requirements.
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.