Which model scored higher on the constrained_rewriting test in our benchmarks?

Gemini 2.5 Flash Lite scored 4/5 on our constrained_rewriting test; Claude Haiku 4.5 scored 3/5.

Is there an external benchmark I should consider for this task?

There is no external benchmark for Constrained Rewriting in the provided payload. Our verdict relies on the internal constrained_rewriting score and supporting proxy metrics.

How do costs compare for large-scale constrained rewriting?

Gemini 2.5 Flash Lite: input $0.1 per mTok, output $0.4 per mTok. Claude Haiku 4.5: input $1 per mTok, output $5 per mTok. Gemini is substantially cheaper per token.

Do both models preserve meaning when compressing?

Yes. Both models tie on faithfulness with a 5/5 in our testing, so preservation of source meaning is comparable.

When might I pick Claude Haiku 4.5 despite the lower constrained_rewriting score?

Pick Claude Haiku 4.5 when you need stronger strategic analysis or multi-step planning for a rewrite pipeline (strategic_analysis 5 vs 3; agentic_planning 5 vs 4) and are willing to trade per-token cost for that capability.

Claude Haiku 4.5 vs Gemini 2.5 Flash Lite for Constrained Rewriting

Gemini 2.5 Flash Lite is the clear winner for Constrained Rewriting in our testing. On our constrained_rewriting test Gemini scores 4/5 vs Claude Haiku 4.5's 3/5 and ranks 6 of 52 vs 31 of 52. There is no external benchmark for this task in the payload, so this verdict is based on our internal task score and supporting proxy metrics (structured_output 4 vs 4, faithfulness 5 vs 5, long_context 5 vs 5). Cost and context-size also favor Gemini: input/output costs are $0.1/$0.4 per mTok vs Claude's $1/$5 per mTok, and Gemini's context window is 1,048,576 tokens vs Claude's 200,000 tokens.

anthropic

Claude Haiku 4.5

Overall

4.33/5Strong

Benchmark Scores

Faithfulness

5/5

Long Context

5/5

Multilingual

5/5

Tool Calling

5/5

Classification

4/5

Agentic Planning

5/5

Structured Output

4/5

Safety Calibration

2/5

Strategic Analysis

5/5

Persona Consistency

5/5

Constrained Rewriting

3/5

Creative Problem Solving

4/5

External Benchmarks

SWE-bench Verified

N/A

MATH Level 5

N/A

AIME 2025

N/A

Pricing

Input

$1.00/MTok

Output

$5.00/MTok

Context Window200K

modelpicker.net

google

Gemini 2.5 Flash Lite

Overall

3.92/5Strong

Benchmark Scores

Faithfulness

5/5

Long Context

5/5

Multilingual

5/5

Tool Calling

5/5

Classification

3/5

Agentic Planning

4/5

Structured Output

4/5

Safety Calibration

1/5

Strategic Analysis

3/5

Persona Consistency

5/5

Constrained Rewriting

4/5

Creative Problem Solving

3/5

External Benchmarks

SWE-bench Verified

N/A

MATH Level 5

N/A

AIME 2025

N/A

Pricing

Input

$0.100/MTok

Output

$0.400/MTok

Context Window1049K

modelpicker.net

Task Analysis

Constrained Rewriting (compression within hard character limits) demands precise length control, faithfulness to source meaning, predictable structured output, and robust handling of edge cases in long documents. Key capabilities: 1) constraint adherence (truncate/preserve semantics to meet hard limits), 2) faithfulness (avoid hallucination while compressing), 3) structured_output and format compliance (JSON or exact-length outputs), 4) handling long context so the model can locate and compress key passages, and 5) predictable refusal/safety behavior when asked to compress disallowed content. In our data there is no external benchmark for this task, so the primary signal is the internal constrained_rewriting score: Gemini 2.5 Flash Lite 4/5 vs Claude Haiku 4.5 3/5. Supporting scores explain the result: both models tie on faithfulness (5) and structured_output (4), so accuracy and format adherence are comparable; Gemini's higher constrained_rewriting score and its rank (6/52 vs 31/52) indicate better practical compression behavior under hard limits. Claude Haiku 4.5 provides stronger strategic_analysis (5 vs 3) and agentic_planning (5 vs 4), which can help for multi-step rewrite strategies but did not translate into a higher constrained_rewriting score in our tests.

Practical Examples

Social media character trimming (tight, single-paragraph compressions): Gemini 2.5 Flash Lite (constrained_rewriting 4 vs 3) produces shorter, more reliable rewrites that meet exact character caps; choose Gemini when strict length compliance is required. 2) Enterprise SMS/notification compression where fidelity matters: both models tie on faithfulness (5), so either preserves meaning, but Gemini is more likely to hit the character limit without manual edits. 3) Multi-pass editorial compression (decompose, compress, refine): Claude Haiku 4.5 scores higher at strategic_analysis (5 vs 3) and agentic_planning (5 vs 4), so it can be better when you want the model to propose multi-step compression strategies before applying them, despite scoring lower on the single-step constrained_rewriting metric. 4) Very long documents requiring context awareness: both report long_context 5, but Gemini's context_window is 1,048,576 tokens vs Claude's 200,000 tokens — Gemini can keep more source context inline while compressing. 5) Cost-sensitive bulk rewrites: Gemini costs $0.1/$0.4 per mTok (input/output) vs Claude's $1/$5 per mTok; in production pipelines Gemini is far more cost-efficient for mass constrained rewriting tasks.

Bottom Line

For Constrained Rewriting, choose Gemini 2.5 Flash Lite if you need reliable, low-cost compression that hits strict character limits (scores 4 vs 3; ranks 6/52 vs 31/52; much lower per-token cost). Choose Claude Haiku 4.5 if your workflow benefits from stronger strategic analysis or multi-step, planner-driven rewrite strategies despite a lower single-pass constrained_rewriting score.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Claude Haiku 4.5 vs Gemini 2.5 Flash Lite for Constrained Rewriting

Claude Haiku 4.5

Gemini 2.5 Flash Lite

Task Analysis

Practical Examples

Bottom Line

How We Test

Frequently Asked Questions

Which model scored higher on the constrained_rewriting test in our benchmarks?

Is there an external benchmark I should consider for this task?

How do costs compare for large-scale constrained rewriting?

Do both models preserve meaning when compressing?

When might I pick Claude Haiku 4.5 despite the lower constrained_rewriting score?