deepseek
R1 0528
R1 0528 is an updated release of DeepSeek's R1 reasoning model, offering meaningfully better performance at a lower price point than its predecessor. At $0.50/M input and $2.15/M output, it undercuts R1's $0.70/$2.50 pricing while also expanding the context window from 64K to 163,840 tokens. In our 12-test benchmark suite, R1 0528 ranks 5th out of 52 active models — a significant jump from R1's rank of 28. It achieves top-tier scores across agentic planning (5/5, tied for 1st with 14 other models), tool calling (5/5, tied for 1st with 16 others), faithfulness (5/5), and multilingual (5/5). It is a text-only, open-weight reasoning model with exposed thinking tokens.
Performance
R1 0528 ranks 5th out of 52 active models in our overall benchmark average. Top strengths: agentic planning (5/5, tied for 1st with 14 other models out of 54), tool calling (5/5, tied for 1st with 16 others out of 54), faithfulness (5/5, tied for 1st with 32 others out of 55), persona consistency (5/5), and multilingual (5/5). Classification improved dramatically from R1's 2/5 to 4/5, now tied for 1st with 29 other models out of 53. Safety calibration also jumped from 1/5 (R1) to 4/5 (rank 6 of 55). On external benchmarks, R1 0528 scored 96.6 on MATH Level 5 (rank 5 of 14 models tested externally by Epoch AI) and 66.4 on AIME 2025 (rank 16 of 23). Strategic analysis (4/5, rank 27 of 54) is one of the few mid-range scores, showing less dominance in pure reasoning compared to its other strengths.
Pricing
R1 0528 costs $0.50 per million input tokens and $2.15 per million output tokens. At 10 million output tokens per month, that is $21.50. At 100 million tokens, $215. This makes it cheaper than its predecessor R1 ($0.70/$2.50) on both dimensions. Among bracket peers, Gemini 3 Flash Preview also costs $3.00/M output but averages a higher benchmark score of 4.5, while R1 0528 averages 4.5 as well at $2.15/M — making R1 0528 one of the more cost-efficient options at this performance tier. Critically, reasoning tokens count toward token consumption: if you set large thinking budgets, actual costs will exceed what the response length alone implies. Budget at least 1,000 tokens for max_completion_tokens.
deepseek
R1 0528
Benchmark Scores
External Benchmarks
Roles
Pricing
Input
$0.500/MTok
Output
$2.15/MTok
modelpicker.net
Real-World Costs
Pricing vs Performance
Output cost per million tokens (log scale) vs average score across our 12 internal benchmarks
Try It
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="YOUR_OPENROUTER_KEY",
)
response = client.chat.completions.create(
model="deepseek/deepseek-r1-0528",
messages=[
{"role": "user", "content": "Hello, R1 0528!"}
],
)
print(response.choices[0].message.content)Recommendation
R1 0528 is a strong choice for agentic pipelines, tool-use scenarios, and RAG applications where faithfulness to source material matters. Its 5/5 scores in agentic planning and tool calling put it among the best available for orchestrated multi-step workflows. At rank 5 of 52 overall, it is one of the top-performing models in our dataset at mid-tier pricing. The expanded 163K context window makes it viable for longer document tasks compared to R1's 64K. Consider the reasoning token overhead: short tasks may produce empty responses if max_completion_tokens is set too low. For non-reasoning tasks or classification-heavy pipelines without agentic requirements, lower-cost options may be equally effective.
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.