Claude Haiku 4.5 vs Codestral 2508 for Constrained Rewriting
Winner: Codestral 2508. In our testing both models score 3/5 on Constrained Rewriting and share the same task rank (31 of 53), but Codestral 2508 pulls ahead because it scores 5/5 on structured_output versus Claude Haiku 4.5's 4/5 while matching faithfulness (5/5) and long-context ability (5/5). That stronger structured_output performance — plus much lower per-mTok costs (input 0.3 vs 1, output 0.9 vs 5) — makes Codestral the better practical choice when strict format or exact-length constraints and cost matter. Claude Haiku 4.5 remains preferable if maintaining brand voice or persona under tight compression is the primary objective (see analysis below).
anthropic
Claude Haiku 4.5
Benchmark Scores
External Benchmarks
Pricing
Input
$1.00/MTok
Output
$5.00/MTok
modelpicker.net
mistral
Codestral 2508
Benchmark Scores
External Benchmarks
Pricing
Input
$0.300/MTok
Output
$0.900/MTok
modelpicker.net
Task Analysis
What Constrained Rewriting demands: the task evaluates compression within hard character limits while preserving meaning and adhering to exact formats. Key capabilities: structured_output (format and schema compliance), faithfulness (sticking to source material), precise token/length control, and reliable long_context handling when the source is long. In our testing both Claude Haiku 4.5 and Codestral 2508 score 3/5 on the constrained_rewriting test and share the same rank (31 of 53). To break the tie, we examine supporting benchmarks from our suite: Codestral 2508 scores 5/5 on structured_output while Claude Haiku 4.5 scores 4/5; both score 5/5 on faithfulness and long_context. That pattern indicates Codestral is stronger at exact format compliance (critical when an output must be precisely N characters or match a JSON/CSV schema), while Claude Haiku 4.5 is stronger on persona consistency and broader reasoning features (see scores: persona_consistency 5 vs 3, creative_problem_solving 4 vs 2). Because there is no external benchmark for this task in the payload, these internal scores are the primary evidence in our verdict.
Practical Examples
Where Codestral 2508 shines (based on score gaps and costs):
- High-volume UI copy compression where outputs must match a schema or exact character buckets: structured_output 5/5 vs 4/5 (Claude) and output cost 0.9 vs 5 per mTok make Codestral both more reliable and far cheaper at scale.
- API-driven workflows that require deterministic JSON or CSV snippets trimmed to exact lengths: Codestral's structured_output 5/5 reduces format rework. Where Claude Haiku 4.5 shines (based on supporting scores):
- Brand-voice compression tasks that require preserving persona and tone while cutting length: persona_consistency 5/5 (Claude) vs 3/5 (Codestral) means Claude Haiku 4.5 is more likely to keep the intended voice during aggressive shortening.
- Situations that need flexible, creative rewrites within constraints (e.g., produce a 140-character tweet that still reflects a marketing angle): creative_problem_solving 4/5 (Claude) vs 2/5 (Codestral). Both models are equally reliable on faithfulness and long-context (5/5 each in our testing), so either model preserves source facts when the passage is long; choose based on whether strict format compliance and cost (Codestral) or persona/tone retention and broader reasoning (Claude Haiku 4.5) matter more.
Bottom Line
For Constrained Rewriting, choose Codestral 2508 if you need exact format compliance and cost efficiency (structured_output 5/5; input 0.3 and output 0.9 per mTok). Choose Claude Haiku 4.5 if preserving brand voice or persona under tight character limits is the priority (persona_consistency 5/5; better creative compression behavior). Both scored 3/5 on the constrained_rewriting test in our testing and share the same task rank, so pick by which tradeoff matters more to your workflow.
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.