Claude Haiku 4.5 vs R1 0528 for Constrained Rewriting

R1 0528 is the better choice for Constrained Rewriting in our testing. It scores 4 vs Claude Haiku 4.5's 3 on the constrained_rewriting benchmark and ranks 6th vs 31st among models we tested. That edge reflects more reliable compression within hard character limits plus better safety calibration (4 vs 2). However, R1 0528 has an operational quirk: it can return empty responses on constrained_rewriting unless configured with high max completion tokens (see practical notes below). If you cannot or will not tune runtime parameters, Claude Haiku 4.5 is the safer out-of-the-box fallback despite the lower constrained_rewriting score.

anthropic

Claude Haiku 4.5

Overall
4.33/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
5/5
Tool Calling
5/5
Classification
4/5
Agentic Planning
5/5
Structured Output
4/5
Safety Calibration
2/5
Strategic Analysis
5/5
Persona Consistency
5/5
Constrained Rewriting
3/5
Creative Problem Solving
4/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$1.00/MTok

Output

$5.00/MTok

Context Window200K

modelpicker.net

deepseek

R1 0528

Overall
4.50/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
5/5
Tool Calling
5/5
Classification
4/5
Agentic Planning
5/5
Structured Output
4/5
Safety Calibration
4/5
Strategic Analysis
4/5
Persona Consistency
5/5
Constrained Rewriting
4/5
Creative Problem Solving
4/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
96.6%
AIME 2025
66.4%

Pricing

Input

$0.500/MTok

Output

$2.15/MTok

Context Window164K

modelpicker.net

Task Analysis

Constrained Rewriting demands accurate compression into strict character limits while preserving meaning and required content. Key capabilities: faithfulness (staying true to source), predictable structured output when schemas are required, ability to compress without dropping mandatory elements, and stable behavior under tight token budgets. In our testing the primary signal is the constrained_rewriting benchmark score (no external benchmark supplied). Supporting evidence: both models have top-tier faithfulness (5) and long_context (5), meaning they both can retain source fidelity and handle long inputs. Differences that explain R1 0528's win: a higher constrained_rewriting score (4 vs 3) and stronger safety_calibration (4 vs 2) in our tests. Operational factors matter too: Claude Haiku 4.5 offers a larger context window (200,000 tokens) and explicit multimodal support (text+image->text), while R1 0528's quirks require higher max completion tokens to avoid empty outputs on short tasks. Output cost and token behavior also affect real-world throughput and batching costs (see practical examples).

Practical Examples

  1. High-volume UI text compression (batch subtitle or UI strings): R1 0528 shines — scored 4 vs 3, and its output cost is $2.15/mTok versus Claude Haiku 4.5 at $5/mTok, lowering per-output expense when compressing many strings. Ensure you set high max completion tokens to avoid empty responses from R1 0528's 'empty_on_structured_output/constrained_rewriting' quirk. 2) One-off, long-source compressions (novel chapter summaries with strict char limits): Claude Haiku 4.5 is preferable if you want a predictable out-of-the-box run because it has a larger context window (200,000 tokens) and does not carry R1 0528's empty-response quirk; expect a lower constrained_rewriting score in our tests (3 vs 4) but fewer runtime adjustments. 3) Safety-sensitive compression (redacting or compressing user content where refusal calibration matters): R1 0528 scored better on safety_calibration (4 vs 2), so in our testing it handled harmful/edge requests with more conservative calibration while still achieving higher constrained_rewriting scores. 4) Structured output pipelines: both models score 4 on structured_output in our tests, but R1 0528 lists an operational note that it returns empty responses on structured_output unless configured correctly — plan for that in automated pipelines.

Bottom Line

For Constrained Rewriting, choose R1 0528 if you need the better constrained_rewriting score in our tests (4 vs 3), lower output cost ($2.15 vs $5 per mTok), and stronger safety calibration, and you can set high max completion tokens to avoid empty responses. Choose Claude Haiku 4.5 if you need a larger context window (200,000 tokens), multimodal text+image->text input support, or prefer fewer runtime tweaks for guaranteed non-empty outputs despite the lower constrained_rewriting score.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Frequently Asked Questions