Claude Haiku 4.5 vs Devstral Medium for Creative Writing
Winner: Claude Haiku 4.5. In our testing Claude Haiku 4.5 scores 4.00 on the Creative Writing task vs Devstral Medium's 2.667 (difference = 1.333). Haiku 4.5 outperforms Devstral Medium on creative problem solving (4 vs 2), persona consistency (5 vs 3), and long context (5 vs 4), which are the core dimensions for fiction, voice, and extended narratives. Devstral Medium is cheaper (input 0.4 vs 1, output 2 vs 5 per mTok) and capable for short edits or structured formats, but it lost decisively on our Creative Writing tests.
anthropic
Claude Haiku 4.5
Benchmark Scores
External Benchmarks
Pricing
Input
$1.00/MTok
Output
$5.00/MTok
modelpicker.net
mistral
Devstral Medium
Benchmark Scores
External Benchmarks
Pricing
Input
$0.400/MTok
Output
$2.00/MTok
modelpicker.net
Task Analysis
Creative Writing demands: sustained persona consistency, robust long-context handling, non-obvious idea generation (creative problem solving), and tidy constrained rewriting when needed. Our task uses three tests: creative problem solving, persona consistency, and constrained rewriting. In our testing Claude Haiku 4.5 leads on creative problem solving (4 vs 2) and persona consistency (5 vs 3) and ties on constrained rewriting (3 vs 3). These internal scores map directly to the task: persona consistency (keeps character voice and resists injection), long context (retrieval across 30K+ tokens) and creative problem solving (feasible, original plot/scene ideas) are the decisive capabilities. Cost and parameter support matter operationally: Claude Haiku 4.5 offers a 200k token context window and extra supported parameters (e.g., include_reasoning, structured outputs) that help with long-form narrative control; Devstral Medium has a 131k window but lags on creativity and persona in our benchmarks.
Practical Examples
Where Claude Haiku 4.5 shines (based on score gaps):
- Serial novel drafting: keeps voice across long chapters (long context 5 vs 4) and maintains character consistency (persona consistency 5 vs 3).
- Plot brainstorming and non-obvious twists: generates feasible, specific ideas (creative problem solving 4 vs 2).
- Complex rewrite of a 30K-token outline into scene-by-scene beats: large context and structured outputs support. Where Devstral Medium is appropriate (given its strengths and cost):
- Short-form fiction or micro-stories where budget matters: lower input/output costs (0.4/2 vs 1/5 per mTok) reduce spend.
- Structured templates and format adherence: structured output scores tie at 4, so Devstral handles JSON/format constraints as well as Haiku for constrained outputs.
- Fast prototyping of many short variants: acceptable classification/format behavior (classification 4) but weaker at sustained voice and deep creativity.
Bottom Line
For Creative Writing, choose Claude Haiku 4.5 if you need sustained voice, long-form drafts, and stronger idea-generation (it scored 4.00 vs Devstral Medium's 2.667 on our task). Choose Devstral Medium if budget per token is the priority and you work mainly on short-form pieces, structured templates, or many low-cost iterations (input 0.4 vs 1 and output 2 vs 5 per mTok).
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.