Claude Haiku 4.5 vs DeepSeek V3.1 for Creative Writing

Winner: DeepSeek V3.1. In our testing DeepSeek V3.1 scores 4.333 on the Creative Writing task versus Claude Haiku 4.5's 4.0 (a 0.333-point lead). DeepSeek's advantage is driven by a 5/5 in creative_problem_solving and 5/5 in structured_output versus Haiku's 4/5 and 4/5 respectively; these strengths translate to more original idea generation and tighter adherence to output format for creative briefs. Claude Haiku 4.5 remains competitive—Haiku scores higher on tool_calling (5 vs 3), strategic_analysis (5 vs 4) and safety_calibration (2 vs 1) and offers a far larger context window (200,000 vs 32,768 tokens), making it the better pick for extremely long-form, tool-integrated workflows. All scores cited are from our benchmarks.

anthropic

Claude Haiku 4.5

Overall
4.33/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
5/5
Tool Calling
5/5
Classification
4/5
Agentic Planning
5/5
Structured Output
4/5
Safety Calibration
2/5
Strategic Analysis
5/5
Persona Consistency
5/5
Constrained Rewriting
3/5
Creative Problem Solving
4/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$1.00/MTok

Output

$5.00/MTok

Context Window200K

modelpicker.net

deepseek

DeepSeek V3.1

Overall
3.92/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
4/5
Tool Calling
3/5
Classification
3/5
Agentic Planning
4/5
Structured Output
5/5
Safety Calibration
1/5
Strategic Analysis
4/5
Persona Consistency
5/5
Constrained Rewriting
3/5
Creative Problem Solving
5/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$0.150/MTok

Output

$0.750/MTok

Context Window33K

modelpicker.net

Task Analysis

What Creative Writing demands: fiction and storytelling need strong idea generation (creative_problem_solving), stable characters and voice (persona_consistency), and the ability to compress or rewrite to tight constraints (constrained_rewriting). Our task suite measures those three signals. External benchmark data is not available for this task in the payload, so the primary evidence is our taskScore and component scores from our tests. DeepSeek leads on the task (4.333 vs 4.0) primarily because it scores 5/5 on creative_problem_solving and 5/5 on structured_output—important when you need novel scene ideas and strict format compliance for scripts, outlines, or serialized content. Claude Haiku's strengths (tool_calling 5/5, strategic_analysis 5/5, persona_consistency 5/5 and extreme long-context support) matter for research-driven stories, multi-part narratives with persistent state, and safety-sensitive prompts. Use component-level scores from our benchmarks to match model choice to the capability you need.

Practical Examples

Where DeepSeek V3.1 shines (based on our scores):

  • Generating fresh story beats for a speculative-fiction pitch: DeepSeek scored 5/5 on creative_problem_solving vs Haiku's 4/5, so it produces more non-obvious, feasible ideas for plots and twists.
  • Producing output that must follow a strict schema (script format, scene metadata): DeepSeek's structured_output is 5/5 vs Haiku's 4/5, reducing post-processing.
  • Budgeted creative APIs: DeepSeek output cost per mTok is $0.75 vs Claude Haiku's $5, so it's substantially cheaper per token for iterative drafts. Where Claude Haiku 4.5 shines (based on our scores):
  • Long-form novels or massive worldbuilding that need extreme context: Haiku has a 200,000-token context window vs DeepSeek's 32,768, and both score 5/5 on long_context but Haiku's window enables longer single-pass drafts.
  • Tool-driven research or multi-step pipelines: Haiku's tool_calling is 5/5 vs DeepSeek's 3/5, so it better selects and sequences functions (useful when calling fact-checkers, databases, or asset generators during storytelling).
  • Safety and controlled analysis: Haiku scores higher on safety_calibration (2 vs 1) and strategic_analysis (5 vs 4), which helps when prompts touch sensitive themes or require nuanced tradeoffs. Concrete numerical anchors from our testing: taskScoreB 4.333 vs taskScoreA 4.0; creative_problem_solving 5 (DeepSeek) vs 4 (Haiku); structured_output 5 vs 4; tool_calling 3 vs 5; output_cost_per_mtok $0.75 (DeepSeek) vs $5 (Haiku).

Bottom Line

For Creative Writing, choose Claude Haiku 4.5 if you need extreme context (200,000 tokens), stronger tool-calling, tighter safety calibration, or heavy multi-step research integrated into drafting. Choose DeepSeek V3.1 if you want sharper idea generation and format fidelity (taskScore 4.333 vs 4.0), and much lower output cost ($0.75 vs $5 per mTok) for iterative drafting and experimentation.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Frequently Asked Questions