Claude Sonnet 4.6 vs Gemini 2.5 Pro for Constrained Rewriting

Tie on the core task but Gemini 2.5 Pro is the practical winner for Constrained Rewriting. In our testing both Claude Sonnet 4.6 and Gemini 2.5 Pro score 3/5 on the constrained_rewriting test (rank 31 of 52). Where they differ matters for real projects: Gemini has structured_output 5 vs Claude Sonnet 4.6's 4 and is cheaper (input cost $1.25 vs $3 per mTok; output cost $10 vs $15 per mTok). Those two differences — stronger format/length adherence and lower pricing — make Gemini 2.5 Pro the better choice for most constrained-rewriting workflows. Choose Claude Sonnet 4.6 only when its stronger safety calibration (5 vs 1) or other Sonnet strengths are explicitly required alongside rewriting.

anthropic

Claude Sonnet 4.6

Overall
4.67/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
5/5
Tool Calling
5/5
Classification
4/5
Agentic Planning
5/5
Structured Output
4/5
Safety Calibration
5/5
Strategic Analysis
5/5
Persona Consistency
5/5
Constrained Rewriting
3/5
Creative Problem Solving
5/5

External Benchmarks

SWE-bench Verified
75.2%
MATH Level 5
N/A
AIME 2025
85.8%

Pricing

Input

$3.00/MTok

Output

$15.00/MTok

Context Window1000K

modelpicker.net

google

Gemini 2.5 Pro

Overall
4.25/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
5/5
Tool Calling
5/5
Classification
4/5
Agentic Planning
4/5
Structured Output
5/5
Safety Calibration
1/5
Strategic Analysis
4/5
Persona Consistency
5/5
Constrained Rewriting
3/5
Creative Problem Solving
5/5

External Benchmarks

SWE-bench Verified
57.6%
MATH Level 5
N/A
AIME 2025
84.2%

Pricing

Input

$1.25/MTok

Output

$10.00/MTok

Context Window1049K

modelpicker.net

Task Analysis

What Constrained Rewriting demands: precise compression under hard character limits, reliable adherence to length/format constraints, faithfulness to the source text, and creative rewording to preserve meaning while shortening. Because no external benchmark is provided for this task in the payload, our constrained_rewriting test (one of 12 internal tests) is the primary measure: both models scored 3/5 and rank 31 of 52 in our testing. Use supporting signals to explain differences: structured_output measures schema/format compliance (Gemini 2.5 Pro = 5, Claude Sonnet 4.6 = 4), faithfulness is equal (5 each), and long_context is equal (5 each) — indicating both preserve source content and context well. Practical edge: Gemini's higher structured_output score shows it's more reliable at strict length/format enforcement; Claude's higher safety_calibration (5 vs 1) matters when rewriting sensitive content that requires refusal or careful moderation.

Practical Examples

  1. Tight marketing meta (155 characters): Gemini 2.5 Pro is preferable — both models scored 3/5 on constrained_rewriting, but Gemini's structured_output 5 vs 4 means it is likelier to meet exact character-count schemas without extra prompt engineering. 2) SMS or push-notification compression (<=160 chars): Gemini again has the advantage for reliable schema compliance and is cheaper per token (output $10 vs $15 per mTok), lowering per-message cost at scale. 3) Sensitive content that must be rewritten but also safety-reviewed: choose Claude Sonnet 4.6 — its safety_calibration is 5 vs Gemini's 1 in our testing, so Claude is more likely to refuse or safely transform risky inputs while maintaining faithfulness (faithfulness=5 for both). 4) Long-source summarization that must preserve precise phrasing: both models have long_context=5 and faithfulness=5, so either can retain source detail; prefer Gemini when strict output format or cost matters, prefer Claude when safety handling is required.

Bottom Line

For Constrained Rewriting, choose Claude Sonnet 4.6 if you must prioritize safety calibration and cautious handling of risky input while still getting competent rewriting. Choose Gemini 2.5 Pro if you need stricter format/length enforcement and lower input/output costs — it edges out Sonnet 4.6 in structured_output (5 vs 4) and is cheaper ($1.25/$10 vs $3/$15 per mTok). Both scored 3/5 on the constrained_rewriting test in our testing and rank 31 of 52, so the decision hinges on format precision and cost versus safety needs.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Frequently Asked Questions