Question 1

How large is the performance gap on Creative Writing between these models?

Accepted Answer

In our testing the gap is 1.333 points on the 1-5 task score (Claude Haiku 4.5: 4.00; Devstral Medium: 2.667). That gap is driven primarily by creative problem solving and persona consistency.

Question 2

Does cost change the recommendation?

Accepted Answer

Yes. Claude Haiku 4.5 is costlier (input_cost_per_mtok 1, output_cost_per_mtok 5) versus Devstral Medium (0.4 / 2). If per-token budget is the dominant constraint and tasks are short or template-driven, Devstral Medium can be the pragmatic choice despite lower Creative Writing scores.

Question 3

Which model better maintains character voice over a long manuscript?

Accepted Answer

Claude Haiku 4.5. In our tests persona consistency is 5 for Haiku 4.5 vs 3 for Devstral Medium, and Haiku also has a larger context window (200,000 tokens vs 131,072), both favoring long-form voice retention.

Question 4

Are there tasks where Devstral Medium outperforms Claude Haiku 4.5 for writing?

Accepted Answer

Devstral Medium doesn't beat Haiku on our Creative Writing subtests. However, it ties on structured output (4 vs 4) and is materially cheaper, so it can outperform on cost-efficiency for high-volume, short-form, or heavily structured writing workflows.

Question 5

How should developers integrate these findings into an API decision?

Accepted Answer

Use Claude Haiku 4.5 for endpoints that require long-context composition, strong persona fidelity, and richer creative output. Use Devstral Medium for lower-cost endpoints, short completions, or batch edit jobs where constrained formatting is the priority and deep creativity is less critical.

Claude Haiku 4.5 vs Devstral Medium for Creative Writing

Claude Haiku 4.5

Devstral Medium

Task Analysis

Practical Examples

Bottom Line

How We Test

Frequently Asked Questions