Question 1

Is Devstral Medium better than Gemma 4 31B?

Accepted Answer

No — in our testing Gemma 4 31B wins 10 of 12 benchmarks and Devstral Medium wins 0. The two tie on classification (both scored 4) and long_context (both scored 4).

Question 2

Which model is cheaper per token?

Accepted Answer

Gemma 4 31B is cheaper. Payload prices: Gemma input $0.13 / output $0.38 per mTok; Devstral Medium input $0.40 / output $2.00 per mTok. Devstral's output price is 5.263× Gemma's ($2.00 ÷ $0.38).

Question 3

How much more would I pay at scale?

Accepted Answer

Using a 50/50 input/output split as an example: for 1M tokens/month Devstral ≈ $1,200 vs Gemma ≈ $255; for 10M tokens Devstral ≈ $12,000 vs Gemma ≈ $2,550; for 100M tokens Devstral ≈ $120,000 vs Gemma ≈ $25,500.

Question 4

Which is better for coding and tool-driven workflows?

Accepted Answer

Gemma 4 31B — it scores 5 vs Devstral's 3 on our tool_calling test and is "tied for 1st" on that ranking. In our tests Gemma selected functions, ordered calls, and filled arguments more accurately.

Question 5

Which model is better at sticking to facts and avoiding hallucinations?

Accepted Answer

Gemma 4 31B: faithfulness scored 5 for Gemma vs 4 for Devstral, and Gemma is tied for 1st of 55 models on faithfulness in our testing.

Question 6

Do either model have external benchmark scores (SWE-bench, MATH, AIME) in the payload?

Accepted Answer

No — the payload does not include external benchmark percentages (swebench_verified, math_level_5, aime_2025) for these two models. Our conclusions are based on the internal 12-test suite provided.

Question 7

Which supports multimodal input?

Accepted Answer

Gemma 4 31B is listed as text+image+video->text in the payload; Devstral Medium is listed as text->text.

Devstral Medium vs Gemma 4 31B

Devstral Medium

Gemma 4 31B

Benchmark Analysis

Pricing Analysis

Real-World Cost Comparison

Bottom Line

How We Test

Frequently Asked Questions