deepseek
R1
R1 is DeepSeek's open-source reasoning model, built for tasks that reward extended chain-of-thought processing. It runs 671 billion parameters with 37 billion active per inference pass. In our 12-test benchmark suite, R1 ranks 28th out of 52 active models with standout scores in strategic analysis (5/5), creative problem solving (5/5), faithfulness (5/5), and persona consistency (5/5). It is a text-only model with a 64,000 token context window. At $0.70/M input and $2.50/M output, it competes with Gemini 2.5 Flash ($0.30/M input, $2.50/M output) at the same output price but with a narrower context window and no multimodal support. Its key differentiator is reasoning transparency — thinking tokens are exposed and can be examined, which matters for auditability in analytical workflows.
Performance
R1's top three strengths in our testing: strategic analysis (5/5, tied for 1st with 25 other models out of 54 tested), creative problem solving (5/5, tied for 1st with 7 other models out of 54), and faithfulness (5/5, tied for 1st with 32 others out of 55). Persona consistency and multilingual also scored 5/5. On external benchmarks, R1 scored 93.1 on MATH Level 5 (rank 8 of 14 models tested externally) and 53.3 on AIME 2025 (rank 17 of 23). Classification is a significant weakness at 2/5 (rank 51 of 53) — near the bottom of the field. Safety calibration scored 1/5 (rank 32 of 55, median is 2/5), which is below average. Long context scored 4/5 but ranked only 38th of 55 tested, suggesting room for improvement on very long documents.
Pricing
R1 costs $0.70 per million input tokens and $2.50 per million output tokens. At 10 million output tokens per month, that is $25. At 100 million tokens, it is $250. For input-heavy workloads (e.g., large document analysis), note that $0.70/M input is higher than Gemini 2.5 Flash ($0.30/M) at the same output price. Within the broader catalog, output pricing ranges from $0.10 to $25 per million tokens, so R1 sits in the mid-tier. One critical pricing consideration: R1's reasoning tokens count toward token consumption. If you enable thinking with large budgets, your actual token usage — and cost — will be significantly higher than the response length alone suggests.
deepseek
R1
Benchmark Scores
External Benchmarks
Pricing
Input
$0.700/MTok
Output
$2.50/MTok
modelpicker.net
Real-World Costs
Pricing vs Performance
Output cost per million tokens (log scale) vs average score across our 12 internal benchmarks
Try It
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="YOUR_OPENROUTER_KEY",
)
response = client.chat.completions.create(
model="deepseek/deepseek-r1",
messages=[
{"role": "user", "content": "Hello, R1!"}
],
)
print(response.choices[0].message.content)Recommendation
R1 is well-suited for strategic analysis, multi-step reasoning, and creative problem solving where reasoning transparency matters. Teams that need to audit or review thinking chains will find its exposed reasoning tokens valuable. It also performs strongly on faithfulness tasks, making it a solid choice for RAG and summarization where accuracy to source material is critical. Avoid R1 for classification tasks (2/5, rank 51 of 53) or applications requiring broad safety calibration (1/5). Also consider that the 64K context window is significantly smaller than peers like Gemini 2.5 Flash (1M tokens) or R1 0528. If reasoning depth is needed but classification or safety is also required, evaluate alternatives across those dimensions.
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.