Question 1

Is R1 always the better choice for all writing projects?

Accepted Answer

No. In our Writing tests R1 is the overall winner (4.5 vs 3.5), but Claude Haiku 4.5 is stronger on long_context (5 vs 4) and tool_calling (5 vs 4). For extremely long, tool-integrated projects Claude may be preferable.

Question 2

How were these Writing scores measured?

Accepted Answer

The Writing task uses two internal tests: creative_problem_solving and constrained_rewriting. Task scores reflect the models' averaged performance on those tests in our suite and supporting component benchmarks.

Question 3

How big is the cost difference when generating copy?

Accepted Answer

Claude Haiku 4.5 lists output_cost_per_mtok 5; R1 lists 2.5. That makes Claude roughly twice as expensive per output mTok in our payload.

Question 4

Which model is better at maintaining brand voice?

Accepted Answer

Both models tie at persona_consistency 5 in our testing, so either can maintain a consistent voice; choose based on other priorities (creativity vs long-context/tool needs).

Question 5

If I need constrained rewrites for ads and social posts, which should I pick?

Accepted Answer

Pick R1. It scores 4 on constrained_rewriting versus Claude Haiku 4.5’s 3, and R1’s overall Writing score is 4.5 vs 3.5.

Claude Haiku 4.5 vs R1 for Writing

Claude Haiku 4.5

R1

Task Analysis

Practical Examples

Bottom Line

How We Test

Frequently Asked Questions