Question 1

Is R1 0528 better than Devstral 2 2512?

Accepted Answer

In our testing R1 0528 wins the majority of benchmarks (6 of 12). R1 outperforms Devstral on tool calling (5 vs 4), faithfulness (5 vs 4), classification (4 vs 3), safety calibration (4 vs 1), persona consistency (5 vs 4), and agentic planning (5 vs 4). Devstral wins structured_output and constrained_rewriting (5 vs 4).

Question 2

Which model is cheaper per token?

Accepted Answer

Devstral 2 2512 is cheaper: input $0.40 / output $2.00 per million tokens vs R1 0528 at input $0.50 / output $2.15 per million tokens. At a 50/50 input/output split, Devstral costs $1.20 per 1M tokens vs R1 $1.33 per 1M tokens.

Question 3

Which model is better for coding or agentic tool workflows?

Accepted Answer

R1 0528 is better for agentic and tool-heavy workflows: it scores 5/5 on tool_calling and agentic_planning in our tests and is tied for 1st in those rankings. Devstral is capable but scores 4/5 on tool_calling and 4/5 on agentic_planning.

Question 4

Which model should I pick for JSON/schema outputs?

Accepted Answer

Devstral 2 2512 is the stronger choice for strict schema adherence and constrained rewriting: it scores 5/5 and is tied for 1st on structured_output and constrained_rewriting in our tests, while R1 scores 4/5 in those areas.

Question 5

Are there any operational quirks I should know about?

Accepted Answer

Yes. R1 0528’s payload notes it can return empty responses on structured_output, constrained_rewriting, and agentic_planning and uses reasoning tokens that consume output budget; it also needs high max completion tokens (min_max_completion_tokens: 1000). Test these behaviors before deploying schema-heavy or short-completion tasks.

Question 6

How do external math benchmarks compare?

Accepted Answer

Beyond our 1–5 tests, R1 0528 scored 96.6% on MATH Level 5 and 66.4% on AIME 2025 according to Epoch AI (these are supplementary external measures in the payload). Devstral 2 2512 has no external math scores included in the payload.

R1 0528 vs Devstral 2 2512

R1 0528

Devstral 2 2512

Benchmark Analysis

Pricing Analysis

Real-World Cost Comparison

Bottom Line

How We Test

Frequently Asked Questions