Question 1

Is DeepSeek V3.1 Terminus better than Devstral Medium?

Accepted Answer

In our benchmarks DeepSeek V3.1 Terminus wins 6 of 12 tests (including 5/5 long_context and 5/5 structured_output). Devstral Medium wins 2 tests (classification and faithfulness, both 4/5). Which is "better" depends on your needs: DeepSeek for long documents and structured outputs; Devstral for classification and content fidelity.

Question 2

Which model is cheaper to run?

Accepted Answer

DeepSeek V3.1 Terminus is materially cheaper: input $0.21/M-token and output $0.79/M-token vs Devstral Medium input $0.40 and output $2.00. Assuming a 50/50 input/output split, cost per 1M tokens is ~$0.50 (DeepSeek) vs ~$1.20 (Devstral); 100M tokens/month ≈ $50 vs $120.

Question 3

Which model is better for handling long context documents?

Accepted Answer

DeepSeek V3.1 Terminus scores 5/5 on long_context vs Devstral Medium 4/5. DeepSeek is tied for 1st (tied with 36 others) on our long-context ranking, so it's the stronger choice for >30K-token retrieval and tasks that need high accuracy across long inputs.

Question 4

Which model should I pick for structured JSON outputs?

Accepted Answer

DeepSeek V3.1 Terminus scores 5/5 on structured_output vs Devstral Medium 4/5. DeepSeek is tied for 1st with 24 others on structured output, indicating higher reliability for JSON schema compliance and format adherence.

Question 5

Which is better at avoiding hallucinations?

Accepted Answer

Devstral Medium scores 4/5 on faithfulness vs DeepSeek's 3/5; Devstral ranks 34/55 on faithfulness while DeepSeek ranks 52/55. If minimizing hallucination and staying close to source material is critical, Devstral performs better in our tests.

Question 6

Are there tests where they tie?

Accepted Answer

Yes — constrained_rewriting (3/3), tool_calling (3/3), safety_calibration (1/1), and agentic_planning (4/4) are ties in our suite, meaning both models perform similarly on those tasks in our evaluation.

DeepSeek V3.1 Terminus vs Devstral Medium

DeepSeek V3.1 Terminus

Devstral Medium

Benchmark Analysis

Pricing Analysis

Real-World Cost Comparison

Bottom Line

How We Test

Frequently Asked Questions