Question 1

Is R1 0528 better than GPT-5?

Accepted Answer

It depends on goals. GPT-5 wins more benchmarks in our suite (2 vs 1) and is stronger on structured_output and strategic_analysis (5 vs 4 each). R1 0528 wins safety_calibration (4 vs 2 in our testing) and is far cheaper per token.

Question 2

Which model is cheaper?

Accepted Answer

R1 0528 is cheaper. Output costs: $2.15 per 1k tokens (R1) vs $10 per 1k tokens (GPT-5). Input costs: $0.50/mkTok (R1) vs $1.25/mkTok (GPT-5). At 10M output tokens/month R1 ≈ $21,500 vs GPT-5 ≈ $100,000.

Question 3

Which is better for structured JSON outputs?

Accepted Answer

GPT-5: in our testing GPT-5 scores 5 vs R1 4 on structured_output, and ranks tied for 1st on that metric. Note: R1’s quirks include 'empty_on_structured_output', which can cause empty responses despite a 4/5 score.

Question 4

Which is safer at refusing harmful requests?

Accepted Answer

R1 0528: safety_calibration in our testing is 4 for R1 vs 2 for GPT-5, so R1 more consistently refuses harmful prompts while permitting legitimate requests in our benchmark.

Question 5

Which is better for math and coding?

Accepted Answer

On third-party math/coding benchmarks (Epoch AI), GPT-5 scores higher: MATH Level 5 98.1% (rank 1 of 14) vs R1 96.6% (rank 5 of 14); AIME 2025 91.4% (GPT-5) vs 66.4% (R1). For SWE-bench Verified (real GitHub issue resolution), GPT-5 shows 73.6% (Epoch AI); R1 has no SWE-bench entry in the payload.

Question 6

Can both handle long context?

Accepted Answer

Yes. In our testing both score 5/5 on long_context. Context windows differ: R1 163,840 tokens vs GPT-5 400,000 tokens — both rank tied for 1st among tested models on long-context retrieval in our suite.

R1 0528 vs GPT-5

R1 0528

GPT-5

Benchmark Analysis

Pricing Analysis

Real-World Cost Comparison

Bottom Line

How We Test

Frequently Asked Questions