Question 1

Both models scored 4/5 on constrained_rewriting — why is GPT-5.4 the winner?

Accepted Answer

Although both scored 4/5 in our constrained_rewriting tests and share rank 6 of 52, R1 0528 has a documented quirk: it can return empty responses on constrained_rewriting and structured_output tasks unless you configure very large completion settings. That failure mode makes GPT-5.4 the practical winner for predictable, short-format rewriting.

Question 2

Can I use R1 0528 if I want to save cost?

Accepted Answer

Yes. R1 0528 is substantially cheaper (input $0.50/mTok, output $2.15/mTok vs GPT-5.4 input $2.50/mTok, output $15.00/mTok) and scored well on tool_calling (5) and multilingual tasks (5). Use it for high-volume, batch constrained rewriting only if you set high max_completion_tokens (R1 notes min_max_completion_tokens: 1000) and test for empty-response behavior first.

Question 3

Does context window size matter for constrained rewriting?

Accepted Answer

It can. Constrained Rewriting sometimes requires extracting and compressing content from long sources. GPT-5.4 offers a much larger context window (1,050,000 tokens) versus R1 0528's 163,840 tokens, so GPT-5.4 is better when the source material is extremely long or you must reference many items while compressing.

Question 4

Which model better preserves strict output formats like JSON schemas?

Accepted Answer

In our testing GPT-5.4 scored 5/5 on structured_output while R1 0528 scored 4/5 and explicitly documents empty responses on structured_output in its quirks. For strict schema adherence, prefer GPT-5.4.

Question 5

Are there settings that make R1 0528 reliable for constrained rewriting?

Accepted Answer

R1 0528's quirks indicate it uses reasoning tokens and requires high max-completion-token settings to avoid empty outputs. If you can provide high max_completion_tokens, accept longer reasoning-token overhead, and validate outputs programmatically, R1 0528 can be a cost-effective option — but you must test thoroughly in your exact prompt/length constraints.

R1 0528 vs GPT-5.4 for Constrained Rewriting

R1 0528

GPT-5.4

Task Analysis

Practical Examples

Bottom Line

How We Test

Frequently Asked Questions