Question 1

They both scored 3/5 on constrained_rewriting — why call Gemini the winner?

Accepted Answer

Both models tied on the core constrained_rewriting test (3/5). We call Gemini 2.5 Pro the practical winner because it scores higher on structured_output (5 vs 4) and has lower input/output costs (input $1.25 vs $3 per mTok; output $10 vs $15 per mTok), which matter for strict-length tasks and production budgets.

Question 2

Is there an external benchmark deciding the winner?

Accepted Answer

No. The payload contains no externalBenchmark for this task, so the constrained_rewriting internal test and supporting internal scores drove the comparison.

Question 3

When should I pick Claude Sonnet 4.6 instead?

Accepted Answer

Pick Claude Sonnet 4.6 when safety calibration is critical (Claude safety_calibration=5 vs Gemini=1 in our testing) or when you value Anthropic's other strengths alongside rewriting. For purely format-driven compression at scale, Gemini is the more cost-effective, format-reliable choice.

Question 4

Do both models preserve source meaning and long documents?

Accepted Answer

Yes. In our testing both models have faithfulness=5 and long_context=5, indicating they preserve source material and handle large contexts similarly well for constrained rewriting.

Question 5

How should developers optimize prompts for constrained rewriting?

Accepted Answer

Focus on explicit length constraints and structured_output schemas. Because Gemini has a higher structured_output score, it needs less prompt engineering to meet strict schemas; Claude may require clearer explicit instructions but provides stronger safety behavior when handling borderline content.

Claude Sonnet 4.6 vs Gemini 2.5 Pro for Constrained Rewriting

Claude Sonnet 4.6

Gemini 2.5 Pro

Task Analysis

Practical Examples

Bottom Line

How We Test

Frequently Asked Questions