Question 1

How much better is Claude Haiku 4.5 for Writing in your tests?

Accepted Answer

Claude Haiku 4.5 scores 3.5 vs Devstral Medium's 2.5 on our Writing task — a 1.0-point advantage driven mainly by creative_problem_solving (4 vs 2), persona_consistency (5 vs 3), and long_context (5 vs 4).

Question 2

What are the exact cost differences I should expect?

Accepted Answer

Per the payload, Claude Haiku 4.5 costs $1.00 input and $5.00 output per mtok; Devstral Medium costs $0.40 input and $2.00 output per mtok. That makes Haiku roughly 2.5x more expensive per mtok (priceRatio: 2.5).

Question 3

Do either model handle long documents and multi-section drafts well?

Accepted Answer

Yes — in our testing Claude Haiku 4.5 scored 5 for long_context vs Devstral Medium's 4, so Haiku is better at maintaining accuracy and coherence across large inputs (Haiku has a 200,000-token context window vs Devstral's 131,072).

Question 4

Which model is better at constrained rewriting for character-limited copy?

Accepted Answer

They tie on constrained_rewriting in our tests (both score 3), so for strict-length edits both models provide similar compressed outputs; choose based on persona or cost priorities.

Question 5

Can I expect fewer hallucinations with the winner?

Accepted Answer

In our testing Claude Haiku 4.5 had higher faithfulness (5 vs 4), indicating it sticks closer to source material and briefs than Devstral Medium on the Writing tasks we ran.

Question 6

Which model supports richer parameter controls relevant to content generation?

Accepted Answer

Both models support structured_outputs, response formatting, stop tokens, temperature and tool parameters. Claude Haiku 4.5 additionally lists include_reasoning and reasoning flags in the payload; Devstral Medium lists frequency_penalty, presence_penalty and seed. Use Haiku for reasoning-aware prompts and Devstral for penalty-based generation control.

Claude Haiku 4.5 vs Devstral Medium for Writing

Claude Haiku 4.5

Devstral Medium

Task Analysis

Practical Examples

Bottom Line

How We Test

Frequently Asked Questions