Claude Haiku 4.5 vs Codestral 2508 for Creative Writing

Winner: Claude Haiku 4.5. In our testing on the Creative Writing suite, Claude Haiku 4.5 scores 4.00 vs Codestral 2508's 2.6667 (a 1.33-point advantage). Haiku 4.5 outperforms on persona_consistency (5 vs 3) and creative_problem_solving (4 vs 2), which are the core skills for character voice, plot originality, and narrative coherence. Codestral 2508 is stronger at structured_output (5 vs 4) and costs less per mTok, but those strengths do not offset Haiku's lead on the writing-specific dimensions we tested.

anthropic

Claude Haiku 4.5

Overall
4.33/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
5/5
Tool Calling
5/5
Classification
4/5
Agentic Planning
5/5
Structured Output
4/5
Safety Calibration
2/5
Strategic Analysis
5/5
Persona Consistency
5/5
Constrained Rewriting
3/5
Creative Problem Solving
4/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$1.00/MTok

Output

$5.00/MTok

Context Window200K

modelpicker.net

mistral

Codestral 2508

Overall
3.50/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
4/5
Tool Calling
5/5
Classification
3/5
Agentic Planning
4/5
Structured Output
5/5
Safety Calibration
1/5
Strategic Analysis
2/5
Persona Consistency
3/5
Constrained Rewriting
3/5
Creative Problem Solving
2/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$0.300/MTok

Output

$0.900/MTok

Context Window256K

modelpicker.net

Task Analysis

Creative Writing requires strong persona consistency, non-obvious plot/idea generation, and the ability to compress or rewrite within constraints. Our Creative Writing task uses three tests: creative_problem_solving, persona_consistency, and constrained_rewriting. There is no external benchmark for this comparison, so our internal task scores are the primary signal: Claude Haiku 4.5 scores 4.00 while Codestral 2508 scores 2.6667 and ranks 28 vs 49 out of 52 for this task in our testing. Supporting component scores show why: Haiku leads on persona_consistency (5 vs 3) and creative_problem_solving (4 vs 2), tying on constrained_rewriting (3 each). Both models score 5 on long_context, so multi-chapter continuity is feasible on either model; both also tie at 5 for tool_calling and faithfulness. Codestral's 5 in structured_output makes it better for rigid JSON/beat-sheet outputs, but Haiku's higher creative and persona scores make it the better choice for compelling fiction and character-driven scenes.

Practical Examples

  1. Character-driven short story or serialized novel arcs: Choose Claude Haiku 4.5. It scores persona_consistency 5 vs Codestral's 3 and creative_problem_solving 4 vs 2, so Haiku produces more consistent voices and non-obvious plot moves across scenes. 2) Rapid templated story generation (JSON beats, episode skeletons, high volume): Choose Codestral 2508. Structured_output is 5 vs Haiku's 4 and its costs are lower (input 0.3 / output 0.9 per mTok vs Haiku input 1 / output 5 per mTok), making it cheaper for high-throughput, format-constrained tasks. 3) Long-form continuity (multi-chapter drafts, worldbuilding): Both models rate 5 on long_context in our testing, so either can handle long contexts; prefer Haiku when you need stronger character arcs. 4) Constraint-heavy microfiction (tight character limits): Both tie at constrained_rewriting (3), so expect similar performance; use Haiku when persona fidelity matters or Codestral when you need structured JSON output and lower cost.

Bottom Line

For Creative Writing, choose Claude Haiku 4.5 if you need stronger character voice, richer plot invention, and overall higher creative-writing quality (task score 4.00 vs 2.67). Choose Codestral 2508 if you prioritize rigid structured outputs (structured_output 5 vs 4) and much lower per-mTok costs (input 0.3 / output 0.9 vs Haiku input 1 / output 5) for high-volume, template-driven story production.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Frequently Asked Questions