Question 1

How large is the performance gap on this task?

Accepted Answer

In our testing Devstral 2 2512 scores 5/5 on Constrained Rewriting while Claude Haiku 4.5 scores 3/5 — a 2-point gap on the 1–5 scale, with Devstral tied for 1st on the task and Claude ranked much lower.

Question 2

Which internal capabilities explain Devstral’s win?

Accepted Answer

Devstral’s win is explained by its structured_output 5 (vs Claude’s 4) and top constrained_rewriting rank in our tests. Those metrics reflect better adherence to tight formats and length constraints, which are essential for compression tasks.

Question 3

When should I still pick Claude Haiku 4.5 for constrained rewriting work?

Accepted Answer

Pick Claude Haiku 4.5 when preserving exact phrasing, nuance, or brand voice is the priority — Claude scores 5 on faithfulness and 5 on persona_consistency in our testing, so it better retains source meaning even if it’s less consistent at squeezing into strict limits.

Question 4

Does cost or context window affect the recommendation?

Accepted Answer

Yes. Devstral’s payload shows lower output cost (2¢/mTok vs Claude’s 5¢/mTok) and a 256K+ context window; Claude Haiku 4.5 has a 200K context window and higher output cost. For high-volume constrained rewriting, Devstral is both more accurate (per our scores) and more cost-effective.

Claude Haiku 4.5 vs Devstral 2 2512 for Constrained Rewriting

Claude Haiku 4.5

Devstral 2 2512

Task Analysis

Practical Examples

Bottom Line

How We Test

Frequently Asked Questions