Question 1

Is R1 0528 better than DeepSeek V3.1 Terminus?

Accepted Answer

In our 12-test suite R1 0528 wins 7 tests while DeepSeek V3.1 Terminus wins 2 and three tests tie. R1 leads on tool calling (5 vs 3), faithfulness (5 vs 3), persona consistency (5 vs 4) and safety calibration (4 vs 1).

Question 2

Which model is cheaper to run?

Accepted Answer

DeepSeek V3.1 Terminus is cheaper. Output cost per mTok: Terminus $0.79 vs R1 0528 $2.15. With a 50/50 input/output split, 1M tokens/month cost ≈ $500 for Terminus vs ≈ $1,325 for R1 (a $825 monthly gap).

Question 3

Which is better for coding and tool-based workflows?

Accepted Answer

R1 0528 is better for tool-based and coding-adjacent workflows — it scores 5 on tool calling (tied for 1st) versus Terminus’s 3 (rank 47 of 54). Expect more accurate function selection and argument sequencing from R1 in our tests.

Question 4

Which model is better for strict JSON/schema outputs?

Accepted Answer

DeepSeek V3.1 Terminus wins structured output: Terminus scored 5 vs R1 4 and is tied for 1st in structured_output. R1 also has a quirk that returns empty responses on structured_output, so Terminus is the clear choice for schema compliance.

Question 5

How do they compare on safety and refusing harmful requests?

Accepted Answer

R1 0528 scored 4 on safety_calibration (rank 6 of 55) while Terminus scored 1 (rank 32 of 55). In our safety tests R1 more reliably refused harmful prompts and permitted legitimate ones.

Question 6

Are there any notable quirks I should plan for?

Accepted Answer

Yes. R1 0528 documents returning empty responses on structured_output, constrained_rewriting, and agentic_planning; it uses reasoning tokens that consume output budget on short tasks and has a min_max_completion_tokens behavior. DeepSeek V3.1 Terminus has no quirks listed in the payload.

Question 7

What about math and academic benchmarks?

Accepted Answer

R1 0528 posts supplementary external scores: 96.6% on MATH Level 5 and 66.4% on AIME 2025 (Epoch AI). DeepSeek V3.1 Terminus has no external math scores in the payload to compare.

R1 0528 vs DeepSeek V3.1 Terminus

R1 0528

DeepSeek V3.1 Terminus

Benchmark Analysis

Pricing Analysis

Real-World Cost Comparison

Bottom Line

How We Test

Frequently Asked Questions