Claude Haiku 4.5 vs Devstral Small 1.1 for Writing

Claude Haiku 4.5 is the winner for Writing in our testing. It scores 3.5 vs Devstral Small 1.1's 2.5 on the Writing task in our suite, driven by a 4 vs 2 advantage on creative_problem_solving and stronger supporting abilities (persona_consistency 5 vs 2, long_context 5 vs 4, faithfulness 5 vs 4). Devstral Small 1.1 is lower-cost (input $0.10 / output $0.30 per mTok) and can be suitable for high-volume templated copy, but it trails on the creative and persona-driven aspects required for high-quality blog and marketing content.

anthropic

Claude Haiku 4.5

Overall
4.33/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
5/5
Tool Calling
5/5
Classification
4/5
Agentic Planning
5/5
Structured Output
4/5
Safety Calibration
2/5
Strategic Analysis
5/5
Persona Consistency
5/5
Constrained Rewriting
3/5
Creative Problem Solving
4/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$1.00/MTok

Output

$5.00/MTok

Context Window200K

modelpicker.net

mistral

Devstral Small 1.1

Overall
3.08/5Usable

Benchmark Scores

Faithfulness
4/5
Long Context
4/5
Multilingual
4/5
Tool Calling
4/5
Classification
4/5
Agentic Planning
2/5
Structured Output
4/5
Safety Calibration
2/5
Strategic Analysis
2/5
Persona Consistency
2/5
Constrained Rewriting
3/5
Creative Problem Solving
2/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$0.100/MTok

Output

$0.300/MTok

Context Window131K

modelpicker.net

Task Analysis

Writing (blog posts, marketing copy, content creation) demands creative idea generation, tight constrained rewrites (headlines, length limits), consistent voice/persona across documents, long-context coherence for multi-section drafts, and reliable faithfulness to brief. Our Writing task uses two focused tests: creative_problem_solving and constrained_rewriting. In our testing Claude Haiku 4.5 scores 4 on creative_problem_solving vs Devstral Small 1.1's 2, while both score 3 on constrained_rewriting. Haiku's higher persona_consistency (5 vs 2), long_context (5 vs 4), faithfulness (5 vs 4) and tool_calling (5 vs 4) explain why it produces more coherent, on-brief, and personality-rich drafts. Note also modality and context differences: Haiku supports text+image->text and a 200,000-token window; Devstral Small 1.1 is text->text with a 131,072-token window. There is no third-party external benchmark for Writing included in the payload, so this verdict is based on our internal task scores and supporting subtest metrics.

Practical Examples

High-effort brand storytelling: Choose Claude Haiku 4.5 (creative_problem_solving 4 vs 2) — it better generates original campaign concepts, multi-section blog drafts, and maintains persona across a long draft (persona_consistency 5 vs 2; long_context 5 vs 4). Headline/character-limited rewrites: Both models tie on constrained_rewriting (3 vs 3), so either can meet strict length constraints, but Haiku will better preserve voice. Rapid volume A/B marketing copy: Devstral Small 1.1 is practical for low-cost bulk variants (input $0.10 / output $0.30 per mTok) when creativity demand is moderate. Fact-sensitive product descriptions: Haiku's faithfulness 5 vs 4 reduces editing. Structured outputs and formatting (JSON/article templates) are equal (structured_output 4 vs 4), so both handle schema-based generation similarly.

Bottom Line

For Writing, choose Claude Haiku 4.5 if you need stronger creative idea generation, long-context drafting, and consistent persona (task score 3.5 vs 2.5). Choose Devstral Small 1.1 if you prioritize lower per-token cost (input $0.10 / output $0.30 per mTok) for high-volume, template-driven copy and can accept lower creative and persona fidelity.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Frequently Asked Questions