R1 0528 vs GPT-5.4 for Constrained Rewriting
GPT-5.4 is the better choice for Constrained Rewriting. In our testing both models scored 4/5 and share rank 6 of 52 on this task, but R1 0528 has a documented quirk that can return empty responses on constrained_rewriting tasks and requires large max-completion settings (min_max_completion_tokens: 1000). That functional failure makes GPT-5.4 the reliable winner despite R1 0528's much lower input/output costs (R1 input $0.50/mTok, output $2.15/mTok vs GPT-5.4 input $2.50/mTok, output $15.00/mTok).
deepseek
R1 0528
Benchmark Scores
External Benchmarks
Pricing
Input
$0.500/MTok
Output
$2.15/MTok
modelpicker.net
openai
GPT-5.4
Benchmark Scores
External Benchmarks
Pricing
Input
$2.50/MTok
Output
$15.00/MTok
modelpicker.net
Task Analysis
Constrained Rewriting (defined in our benchmarks as "Compression within hard character limits") demands precise length control, faithfulness to the source, structured-output compliance when a format is required, and predictable completion behavior under tight token budgets. In our testing both R1 0528 and GPT-5.4 scored 4/5 on constrained_rewriting and are tied at rank 6 of 52, so raw task accuracy was comparable. Supporting signals matter when choosing between them: GPT-5.4 scores higher on structured_output (5 vs R1's 4) and safety_calibration (5 vs 4), which helps ensure format-adherent, policy-safe truncations and refusals. R1 0528 shows strong tool_calling (5) and classification (4) support and a 163,840-token context window, but its quirks — notably "empty_on_structured_output" and explicit note that it "Returns empty responses on structured_output, constrained_rewriting, and agentic_planning" and that it "needs_high_max_completion_tokens" — are directly relevant failures for constrained rewriting tasks that rely on short, exact outputs.
Practical Examples
GPT-5.4 (winner): Rewriting a legal paragraph to a 280-character SMS while preserving mandatory clauses and returning a JSON flag for omitted clauses. In our testing GPT-5.4's structured_output 5/5 and safety_calibration 5/5 make it reliable for strict format and compliance needs. It also has a 1,050,000-token context window for large source documents. R1 0528 (cost-efficient alternative): Bulk-compressing long product descriptions across multiple languages where you can supply high max-completion tokens and tolerate reasoning-token overhead — R1 0528 is much cheaper (input $0.50/mTok, output $2.15/mTok) and scored 5/5 on tool_calling and 5/5 on persona_consistency/multilingual in our tests. However, in constrained rewriting workflows that expect short, deterministic outputs, R1 0528 may return empty outputs unless you configure very high max_completion tokens (it documents min_max_completion_tokens: 1000) and handle its reasoning-token consumption, making it risky for one-shot short compression tasks.
Bottom Line
For Constrained Rewriting, choose R1 0528 if you need a low-cost option for large-scale, multilingual compression and you can set high max_completion_tokens and tolerate reasoning-token output behaviors. Choose GPT-5.4 if you need reliable, format-adherent, and policy-safe compressed outputs out of the box — GPT-5.4 avoids R1 0528's empty-output quirk and has stronger structured-output and safety signals.
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.