Question 1

Is DeepSeek V3.1 better than R1 0528?

Accepted Answer

It depends on the task. R1 0528 wins 6 of 12 benchmarks in our tests (tool calling 5 vs 3, safety 4 vs 1). DeepSeek V3.1 wins structured_output (5 vs 4) and creative_problem_solving (5 vs 4) and is much cheaper per token.

Question 2

Which model is cheaper to run?

Accepted Answer

DeepSeek V3.1 is substantially cheaper. Per 1K tokens: input $0.15 and output $0.75. R1 0528 charges input $0.50 and output $2.15. Assuming a 1:1 input/output split, cost per 1M-roundtrip tokens is about $900 for DeepSeek V3.1 vs $2,650 for R1 0528.

Question 3

Which is better for tool calling and agentic workflows?

Accepted Answer

R1 0528. In our suite it scores 5 on tool_calling (tied for 1st) and 5 on agentic_planning (tied for 1st); DeepSeek V3.1 scores 3 on tool_calling and 4 on agentic_planning.

Question 4

Which is better at following JSON schemas and producing strict structured output?

Accepted Answer

DeepSeek V3.1. It scores 5 on structured_output and is tied for 1st in our ranking; R1 0528 scores 4 and ranks mid-table (rank 26 of 54).

Question 5

How do they compare on safety?

Accepted Answer

R1 0528 scored 4 vs DeepSeek V3.1's 1 on safety_calibration in our tests. R1 ranks 6th of 55 on safety; DeepSeek V3.1 ranks 32nd — R1 is noticeably better at refusing harmful prompts in our evaluations.

Question 6

Does either model have external benchmark results?

Accepted Answer

Yes — R1 0528 has external math scores from Epoch AI: 96.6% on MATH Level 5 (rank 5 of 14) and 66.4% on AIME 2025 (rank 16 of 23). DeepSeek V3.1 has no external benchmarks in the payload.

DeepSeek V3.1 vs R1 0528

DeepSeek V3.1

R1 0528

Benchmark Analysis

Pricing Analysis

Real-World Cost Comparison

Bottom Line

How We Test

Frequently Asked Questions