Question 1

How big is the Writing gap between the two models?

Accepted Answer

In our testing the Writing task score is 3.5 for Claude Haiku 4.5 vs 2.5 for Codestral 2508 — a 1.0-point advantage driven mainly by Haiku's creative_problem_solving (4 vs 2) and persona_consistency (5 vs 3).

Question 2

Which model is cheaper to run for large batches of content?

Accepted Answer

Codestral 2508 is materially cheaper in our data: output cost per mTok = 0.9 vs Claude Haiku 4.5 output cost per mTok = 5. For high-volume generation where format fidelity matters, Codestral reduces cost.

Question 3

If I need exact JSON or SEO metadata alongside articles, which should I pick?

Accepted Answer

Pick Codestral 2508 — it scores 5 on structured_output in our testing versus Haiku's 4, so it produces stricter schema-compliant outputs more reliably.

Question 4

Are both models good with long briefs and source material?

Accepted Answer

Yes. In our testing both models tie on long_context (5) and faithfulness (5), so they handle long briefs and stay true to provided source content at similar levels.

Question 5

Does the absence of an external benchmark affect the verdict?

Accepted Answer

There is no external benchmark for Writing in the payload, so our verdict is based on our internal task score and component scores reported above. All claims here are from our testing data.

Claude Haiku 4.5 vs Codestral 2508 for Writing

Claude Haiku 4.5

Codestral 2508

Task Analysis

Practical Examples

Bottom Line

How We Test

Frequently Asked Questions