Question 1

How big is the quality gap between the two models on Creative Writing?

Accepted Answer

In our testing Claude Haiku 4.5 scores 4.00 vs Devstral Small 1.1's 2.33 on the Creative Writing suite — a 1.67-point gap driven by persona_consistency, creative_problem_solving, and long_context.

Question 2

Is Devstral Small 1.1 ever the better buy?

Accepted Answer

Yes — if cost per output token is the dominant factor. Devstral Small 1.1's output cost is 0.3 per mTok vs Claude Haiku 4.5's 5 per mTok (≈16.7x cheaper), making it attractive for high-volume, lower-fidelity generation in our tests.

Question 3

Do either model tie on any Creative Writing subtests?

Accepted Answer

Yes. In our testing they tie on constrained_rewriting (3) and structured_output (4), so for rewriting into strict formats or returning JSON-like story metadata both perform similarly.

Question 4

Which model is better for very long novels or serialized drafts?

Accepted Answer

Claude Haiku 4.5 — it has a larger context window (200,000 tokens vs 131,072) and a long_context score of 5 vs 4 for Devstral in our benchmarks, which helps with coherence across long drafts.

Question 5

How should developers factor these results into an API decision?

Accepted Answer

Use Claude Haiku 4.5 when quality, persona preservation, and long-form coherence are mission-critical. Use Devstral Small 1.1 when cost-per-token and scale matter more than top-tier creative reasoning — both tie on structured_output, so Devstral can work well in automated pipelines that require cheap, predictable formats.

Claude Haiku 4.5 vs Devstral Small 1.1 for Creative Writing

Claude Haiku 4.5

Devstral Small 1.1

Task Analysis

Practical Examples

Bottom Line

How We Test

Frequently Asked Questions