Why did R1 0528 win Writing in your tests?

R1 0528 scores 4.0 vs Claude Haiku 4.5's 3.5 on our Writing suite. Key wins for R1 are constrained_rewriting (4 vs 3) and safety_calibration (4 vs 2), plus a lower output cost (2.15 vs 5 per mTok).

Is Claude Haiku 4.5 useful for marketing copy at all?

Yes. In our tests Haiku matches or exceeds R1 on strategic_analysis (5 vs 4), persona_consistency (5), long_context (5), and faithfulness (5), making it a strong choice for nuanced positioning, image-driven posts, and very long single outputs.

Any operational caveats to using R1 0528 for Writing?

Yes—R1's quirks show it can return empty responses on structured_output, constrained_rewriting, and agentic_planning unless you provide high max_completion_tokens. Configure generous completion limits to avoid truncated or empty outputs.

How do costs compare for writing workloads?

Claude Haiku 4.5: input 1 per mTok, output 5 per mTok. R1 0528: input 0.5 per mTok, output 2.15 per mTok. For large-volume output, R1 is materially cheaper.

Do either model sacrifice creativity or persona for safety?

In our tests both models tie on creative_problem_solving (4) and persona_consistency (5). R1 achieves stronger safety_calibration (4) without a drop in those creative/persona scores in our suite.

Claude Haiku 4.5 vs R1 0528 for Writing

R1 0528 is the better choice for Writing in our tests. On our 12-test Writing suite it scores 4.0 vs Claude Haiku 4.5's 3.5 (rank 6 vs 29). R1 outperforms Haiku on constrained_rewriting (4 vs 3) and safety_calibration (4 vs 2), and it has significantly lower output cost (2.15 vs 5 per mTok). Claude Haiku 4.5 is stronger at strategic_analysis (5 vs 4) and offers a larger context window (200,000 tokens) plus text+image->text modality, so it can be preferable for image-driven long-form briefs or nuanced positioning tasks. Overall winner for general Writing: R1 0528.

anthropic

Claude Haiku 4.5

Overall

4.33/5Strong

Benchmark Scores

Faithfulness

5/5

Long Context

5/5

Multilingual

5/5

Tool Calling

5/5

Classification

4/5

Agentic Planning

5/5

Structured Output

4/5

Safety Calibration

2/5

Strategic Analysis

5/5

Persona Consistency

5/5

Constrained Rewriting

3/5

Creative Problem Solving

4/5

External Benchmarks

SWE-bench Verified

N/A

MATH Level 5

N/A

AIME 2025

N/A

Pricing

Input

$1.00/MTok

Output

$5.00/MTok

Context Window200K

modelpicker.net

deepseek

R1 0528

Overall

4.50/5Strong

Benchmark Scores

Faithfulness

5/5

Long Context

5/5

Multilingual

5/5

Tool Calling

5/5

Classification

4/5

Agentic Planning

5/5

Structured Output

4/5

Safety Calibration

4/5

Strategic Analysis

4/5

Persona Consistency

5/5

Constrained Rewriting

4/5

Creative Problem Solving

4/5

External Benchmarks

SWE-bench Verified

N/A

MATH Level 5

96.6%

AIME 2025

66.4%

Pricing

Input

$0.500/MTok

Output

$2.15/MTok

Context Window164K

modelpicker.net

Task Analysis

What Writing demands: blog posts, marketing copy, and content creation need creativity, tight constrained rewriting (ads/headlines), persona consistency, long-context recall (briefs and style guides), faithfulness to source facts, safety calibration for tone and refusal behavior, and practical cost/throughput for production. No external benchmark is available for this task, so we base the call on our internal suite (two writing tests: creative_problem_solving and constrained_rewriting) and complementary metrics. In our tests: both models tie on creative_problem_solving (4), persona_consistency (5), long_context (5), and faithfulness (5)—so both produce coherent, consistent long-form content. R1 0528 leads on constrained_rewriting (4 vs 3) and safety_calibration (4 vs 2), which directly improves ad-level compression and safer publishing. Claude Haiku 4.5 leads on strategic_analysis (5 vs 4), so it better handles nuanced tradeoffs and positioning. Operational differences that matter: Haiku supports text+image->text, a 200,000 token context, and a 64,000 max output token limit; R1 0528 is text->text with a 163,840 token context and has quirks (needs high max completion tokens and can return empty responses on structured_output/constrained_rewriting unless configured for long completions). Cost also matters: Haiku output cost is 5 per mTok vs R1 output cost 2.15 per mTok.

Practical Examples

Where R1 0528 shines (based on scores):

Tight ad copy and subject-line generation: constrained_rewriting 4 vs 3 means R1 produces better compression and punchier variants in our tests; it's also cheaper (output 2.15 vs 5 per mTok) for high-volume campaigns. Note: R1 may need a high max_completion_tokens setting to avoid empty outputs on these tasks (see quirks).
Safety-sensitive brand guidance: safety_calibration 4 vs 2 — R1 is less likely to produce borderline or disallowed outputs in our testing.
High-throughput content pipelines: equal creative_problem_solving (4) and persona_consistency (5) give R1 solid, consistent outputs at lower output cost. Where Claude Haiku 4.5 shines (based on scores and config):
Strategy-driven positioning and tradeoff writing: strategic_analysis 5 vs 4 — Haiku better at nuanced messaging and multi-factor positioning in our tests.
Image-aware content (product posts, illustrated explainers): Haiku supports text+image->text, enabling workflows that start from visuals.
Very long briefs or single-shot long outputs: larger context window (200,000) and max_output_tokens 64,000 make Haiku preferable for ultra-long single-document generation. Practical note: both models tie on long_context, faithfulness, persona_consistency, and creative_problem_solving in our suite, so for standard blog posts and marketing articles either will produce high-quality drafts; pick R1 for tight constraints and safety, Haiku for image-driven or highly strategic pieces.

Bottom Line

For Writing, choose Claude Haiku 4.5 if you need image-aware generation, extreme single-document length (200k context / 64k outputs), or stronger strategic analysis (5 vs 4). Choose R1 0528 if you prioritize constrained-rewriting for ads/headlines, stronger safety calibration (4 vs 2), and lower output cost (2.15 vs 5 per mTok); R1 wins our Writing benchmark 4.0 vs 3.5.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Claude Haiku 4.5 vs R1 0528 for Writing

Claude Haiku 4.5

R1 0528

Task Analysis

Practical Examples

Bottom Line

How We Test

Frequently Asked Questions

Why did R1 0528 win Writing in your tests?

Is Claude Haiku 4.5 useful for marketing copy at all?

Any operational caveats to using R1 0528 for Writing?

How do costs compare for writing workloads?

Do either model sacrifice creativity or persona for safety?