Claude Haiku 4.5 vs Devstral 2 2512 for Constrained Rewriting
Winner: Devstral 2 2512. In our testing Devstral scores 5/5 on Constrained Rewriting vs Claude Haiku 4.5's 3/5 — a clear 2-point margin. Devstral ranks tied for 1st on this task and also posts a top structured_output score (5 vs 4), which directly supports reliable compression into hard character limits. Claude Haiku 4.5 is stronger on faithfulness (5 vs 4) and tool_calling (5 vs 4), so it preserves source content better, but it underperforms at tight-format compression compared with Devstral.
anthropic
Claude Haiku 4.5
Benchmark Scores
External Benchmarks
Pricing
Input
$1.00/MTok
Output
$5.00/MTok
modelpicker.net
mistral
Devstral 2 2512
Benchmark Scores
External Benchmarks
Pricing
Input
$0.400/MTok
Output
$2.00/MTok
modelpicker.net
Task Analysis
What Constrained Rewriting demands: precise compression within hard character limits while preserving meaning and required structure. Key capabilities: strict structured_output compliance (format/schema), faithfulness to source content, predictable token use for exact length control, and long_context to maintain context when compressing long inputs. There is no external benchmark for this task in the payload, so we base the winner on our internal task scores: Devstral 2 2512 scores 5 on constrained_rewriting and 5 on structured_output; Claude Haiku 4.5 scores 3 on constrained_rewriting and 4 on structured_output. Supporting internal signals: Devstral’s faithfulness is 4 (vs Claude’s 5), and both models have strong long_context (5). In short: Devstral’s higher structured_output and top task rank explain its superiority at strict compression; Claude’s higher faithfulness explains when preservation of nuance matters more than squeezing into a limit.
Practical Examples
- Mobile push notification (<=160 chars): Devstral 2 2512 (5) is the better choice — its 5/5 constrained_rewriting and 5/5 structured_output give more reliable, repeatable compression into the exact character cap. 2) SMS summaries of legal clauses where every phrase must be preserved: Claude Haiku 4.5 (3) is preferable when faithfulness matters — it scores 5 on faithfulness vs Devstral’s 4, so it better preserves precise wording even if it struggles more to hit tight limits. 3) CSV field compression for UI display: Devstral’s structured_output 5 vs Claude’s 4 means Devstral will more reliably produce exact-length, schema-compliant cells. 4) Persona-aware microcopy (brand voice + character limit): Claude’s persona_consistency 5 helps retain voice, but expect more manual prompts to reach a hard limit because its constrained_rewriting score is 3 versus Devstral’s 5. 5) Cost-sensitive bulk rewriting: Devstral is also cheaper on output (2¢/mTok vs Claude’s 5¢/mTok in the payload), so for high-volume constrained rewriting tasks it is both more accurate and more economical.
Bottom Line
For Constrained Rewriting, choose Devstral 2 2512 if you need reliable compression into hard character limits, exact schema/format adherence, and lower output cost — Devstral scores 5 vs Claude’s 3 in our testing. Choose Claude Haiku 4.5 if preserving exact source wording and persona fidelity matters more than hitting a strict character cap — Claude scores 5 on faithfulness and persona_consistency but only 3 on constrained_rewriting.
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.