Claude Haiku 4.5 vs R1 0528 for Creative Writing

Winner: R1 0528. In our testing R1 0528 posts a higher Creative Writing task score (4.333 vs 4.000) and ranks 5th vs Claude Haiku 4.5 at 28th. The decisive advantages are R1's constrained_rewriting (4 vs 3) and safety_calibration (4 vs 2). Claude Haiku 4.5 remains stronger on strategic_analysis (5 vs 4) and offers larger context (200,000 tokens) and explicit multimodal support (text+image->text), but these do not outweigh R1's edge on the core Creative Writing subtests in our suite.

anthropic

Claude Haiku 4.5

Overall
4.33/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
5/5
Tool Calling
5/5
Classification
4/5
Agentic Planning
5/5
Structured Output
4/5
Safety Calibration
2/5
Strategic Analysis
5/5
Persona Consistency
5/5
Constrained Rewriting
3/5
Creative Problem Solving
4/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$1.00/MTok

Output

$5.00/MTok

Context Window200K

modelpicker.net

deepseek

R1 0528

Overall
4.50/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
5/5
Tool Calling
5/5
Classification
4/5
Agentic Planning
5/5
Structured Output
4/5
Safety Calibration
4/5
Strategic Analysis
4/5
Persona Consistency
5/5
Constrained Rewriting
4/5
Creative Problem Solving
4/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
96.6%
AIME 2025
66.4%

Pricing

Input

$0.500/MTok

Output

$2.15/MTok

Context Window164K

modelpicker.net

Task Analysis

Creative Writing demands: creative_problem_solving (novel ideas and plot beats), persona_consistency (voice and character stability), constrained_rewriting (compression to hard limits), long_context (managing long arcs), tone control, and safety calibration (avoiding harmful content while permitting edgy fiction). External benchmarks are not provided for this task, so we base the verdict on our 12-test proxies and the task subtests listed. In our testing R1 0528 scores 4.333 on Creative Writing vs Claude Haiku 4.5 at 4.000. Supporting evidence from subtests: both models tie on creative_problem_solving (4) and persona_consistency (5), but R1's higher constrained_rewriting (4 vs 3) and safety_calibration (4 vs 2) explain its lead. Claude Haiku contributes strengths useful to writers—top strategic_analysis (5) and larger context/window (200,000 tokens and 64k max output)—which help with complex plot tradeoffs and very long drafts, but they did not shift the task score in our suite.

Practical Examples

Where R1 0528 shines (based on our scores):

  • Short-story festival with strict word/character limits: R1's constrained_rewriting 4 vs Claude's 3 yields tighter, higher-quality compressed drafts and edits.
  • Working near content boundaries (edgy themes that still need safe handling): R1's safety_calibration 4 vs Claude's 2 reduces refusals and produces safer, allowable phrasing that passes moderation checks in our tests.
  • Cost-sensitive iterative drafting: R1 input/output cost is $0.50/$2.15 per mTok vs Claude Haiku 4.5 at $1/$5 per mTok, so R1 is materially cheaper for many generations. Where Claude Haiku 4.5 shines (based on our scores and metadata):
  • Complex plot planning and tradeoffs: Claude Haiku's strategic_analysis 5 vs R1's 4 produced better nuance in our planning probes.
  • Very long-form serial or multimodal projects: Claude Haiku's context_window 200,000 and max_output_tokens 64,000 (vs R1 163,840 context and null max_output_tokens) and modality text+image->text are advantages for long arcs or image-driven fiction.
  • Tooling and function integrations: Both tie on tool_calling (5), so developer workflows that need tool selection behave similarly in our tests.

Bottom Line

For Creative Writing, choose Claude Haiku 4.5 if you need larger context, multimodal image→text capabilities, or superior strategic plot analysis and can accept higher output cost ($5 per mTok). Choose R1 0528 if you need tighter constrained rewrites, stronger safety calibration, a top-5 task rank (5 of 52 in our testing), and lower costs ($0.50 input / $2.15 output per mTok).

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Frequently Asked Questions