Claude Haiku 4.5 vs Devstral 2 2512 for Creative Writing
Devstral 2 2512 is the stronger choice for creative writing. In our testing across three task-relevant benchmarks — creative problem solving, persona consistency, and constrained rewriting — Devstral 2 2512 scores 4.33 out of 5 versus Claude Haiku 4.5's 4.0. That composite difference drives a meaningful ranking gap: Devstral 2 2512 sits 5th of 52 models for this task, while Claude Haiku 4.5 sits 28th. The decisive factor is constrained rewriting, where Devstral 2 2512 scores 5/5 (tied for 1st among 5 models out of 53 tested) versus Haiku 4.5's 3/5. On persona consistency, Haiku 4.5 reverses course with a 5/5 versus Devstral 2 2512's 4/5. Both models tie on creative problem solving at 4/5. No external benchmark data is available for either model on this task, so our internal scores are the primary signal. The gap is real but not crushing — Haiku 4.5 is a legitimate creative writing tool, just outgunned on the specific mechanics that matter most here.
anthropic
Claude Haiku 4.5
Benchmark Scores
External Benchmarks
Pricing
Input
$1.00/MTok
Output
$5.00/MTok
modelpicker.net
mistral
Devstral 2 2512
Benchmark Scores
External Benchmarks
Pricing
Input
$0.400/MTok
Output
$2.00/MTok
modelpicker.net
Task Analysis
Creative writing demands three core capabilities from an LLM: the ability to generate novel, non-obvious ideas (creative problem solving); the discipline to maintain a consistent voice, character, or narrator across long outputs (persona consistency); and the craft to work within hard formal constraints — word counts, syllable patterns, structural rules — without sacrificing quality (constrained rewriting). Our 12-test suite evaluates all three directly, making this one of the more precisely measured task categories.
On creative problem solving, both models score 4/5, placing them in a large cluster ranked 9th of 54 — this dimension does not differentiate them. The split emerges on the other two dimensions. Constrained rewriting — which tests compression within hard character limits, a proxy for poetic form, flash fiction, logline writing, and any format with rigid length rules — is where Devstral 2 2512 pulls clearly ahead: 5/5 versus Haiku 4.5's 3/5. That 2-point gap on a 5-point scale is substantial. Persona consistency — maintaining character voice and resisting injection — goes the other way: Haiku 4.5 scores 5/5 (tied for 1st among 37 models) versus Devstral 2 2512's 4/5. For long-form fiction requiring a sustained narrator or character arc, that difference matters. Neither model has external benchmark data available for this comparison, so our internal scores are the complete picture.
Practical Examples
Where Devstral 2 2512 shines:
- Flash fiction and micro-fiction: Its 5/5 constrained rewriting score means it reliably hits exact word counts and structural targets without the output degrading into padding or truncation. Ask it for a 100-word horror story with a twist in the final sentence and it delivers — in our testing, Claude Haiku 4.5's 3/5 constrained rewriting score suggests it struggles more with these hard limits.
- Formal poetry: Sonnets, haiku sequences, villanelles — any form where syllable count and rhyme scheme are non-negotiable. Devstral 2 2512's constrained rewriting advantage is directly applicable here.
- Loglines and pitch copy: Marketing-adjacent creative work that demands a specific character count and a specific emotional beat benefits from Devstral 2 2512's format discipline.
- Cost-sensitive high-volume creative tasks: At $0.40 input / $2.00 output per million tokens versus Haiku 4.5's $1.00 / $5.00, Devstral 2 2512 is 2.5x cheaper — meaningful if you're generating dozens of creative variations.
Where Claude Haiku 4.5 holds its own:
- Long-form fiction with consistent narrators: Haiku 4.5's 5/5 persona consistency score (versus Devstral 2 2512's 4/5) makes it more reliable for multi-chapter stories, serialized content, or any project where a character's voice must remain stable across thousands of words.
- Multilingual creative writing: Both score 5/5 on multilingual — tied for 1st among 35 models — but Haiku 4.5 also supports image input (text+image→text modality), making it usable for ekphrastic writing or prompts driven by visual references.
- Creative work requiring strategic narrative planning: Haiku 4.5 scores 5/5 on strategic analysis versus Devstral 2 2512's 4/5 — useful when plotting complex story structures or analyzing narrative tradeoffs.
- Safety-sensitive creative contexts: Haiku 4.5 scores 2/5 on safety calibration versus Devstral 2 2512's 1/5. Neither model excels here, but Haiku 4.5 is meaningfully less likely to refuse legitimate creative requests or produce problematic outputs.
Bottom Line
For creative writing, choose Claude Haiku 4.5 if you need sustained character voice across long-form fiction, work with visual reference material (it supports image input), or operate in a context where safety calibration matters. Choose Devstral 2 2512 if constrained forms — flash fiction, poetry, character-limited copy — are your primary use case, or if you need to run high-volume creative generation at lower cost ($2.00 vs $5.00 per million output tokens). Devstral 2 2512's 5th-place ranking for this task (versus Haiku 4.5's 28th) reflects a real performance advantage, but the gap is concentrated in one dimension. Match the model to your specific creative format before deciding.
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.