Claude Sonnet 4.6 vs GPT-5.4 for Constrained Rewriting
Winner: GPT-5.4. In our Constrained Rewriting benchmark GPT-5.4 scores 4/5 vs Claude Sonnet 4.6's 3/5 (taskRank: GPT-5.4 = 6 of 52; Sonnet 4.6 = 31 of 52). That one-point lead and GPT-5.4's top structured_output score (5 vs Sonnet's 4) make it the better choice for reliably compressing content to hard character limits while preserving fidelity.
anthropic
Claude Sonnet 4.6
Benchmark Scores
External Benchmarks
Pricing
Input
$3.00/MTok
Output
$15.00/MTok
modelpicker.net
openai
GPT-5.4
Benchmark Scores
External Benchmarks
Pricing
Input
$2.50/MTok
Output
$15.00/MTok
modelpicker.net
Task Analysis
Constrained Rewriting demands exact compression to hard character limits while preserving meaning and required elements. The key capabilities are: accurate character/byte budgeting (format/length control), faithfulness to source content, and reliable structured output when the compressed text must fit a schema. Long-context support helps when compressing large source documents. In our testing the primary task signal is the constrained_rewriting score (GPT-5.4 = 4, Claude Sonnet 4.6 = 3). Supporting signals: GPT-5.4 has structured_output = 5 vs Sonnet's 4 (helps enforce strict length and format rules), both models score faithfulness = 5 (both preserve source material in our tests), and both have long_context = 5 (useful for compressing long inputs). Sonnet scores higher on tool_calling (5 vs GPT-5.4's 4), which can help if your pipeline relies on external length-checking or iterative tool-based compression, but raw on-model constrained-rewriting performance favors GPT-5.4 in our suite.
Practical Examples
- Tight marketing copy (exact 280-char ad): GPT-5.4 (4/5) is more likely in our tests to produce a compliant, meaning-preserving 280 characters while adhering to format constraints thanks to its structured_output = 5. 2) SMS / push-notification conversion where schema and exact length matter: GPT-5.4’s 4/5 constrained_rewriting and higher structured_output score reduce format retries. 3) Batch compressing long docs into fixed-size abstracts: both models have long_context = 5, so either can handle long inputs; GPT-5.4 still outscored Sonnet on the constrained task (4 vs 3). 4) Tool-assisted pipelines that call a length-checker function: Claude Sonnet 4.6’s tool_calling = 5 and creative_problem_solving = 5 make it a strong choice when you intend to orchestrate iterative external checks (Sonnet produced better tool_calling behavior in our tests). 5) Cost/context note: Claude Sonnet 4.6 input cost_per_mtok = 3, GPT-5.4 input_cost_per_mtok = 2.5; both have output_cost_per_mtok = 15 and >1M token context windows (Sonnet: 1,000,000; GPT-5.4: 1,050,000), so budget and extremely long inputs should be considered alongside accuracy.
Bottom Line
For Constrained Rewriting, choose GPT-5.4 if you need the best on-model performance for strict character-limit compression and schema adherence (our tests: 4/5 vs 3/5; structured_output 5 vs 4). Choose Claude Sonnet 4.6 if your workflow uses external tool calls or iterative programmatic length checks (Sonnet: tool_calling 5) or you value Sonnet's higher creative/problem-solving capabilities in complex multi-step compression pipelines.
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.