models/deepseek/deepseek-v3-1-terminus
D
DeepSeek·active

DeepSeek V3.1 Terminus

DeepSeek's efficiency model. Context window: 164K tokens.

Overall score
3.85
/5.00 · ranked #65
Input
$0.270
per 1M tokens
Output
$0.950
per 1M tokens
Context
164K
tokens
Blended
$0.780
3:1 out:in ratio

Price drops, new benchmarks, model updates. Stay current on DeepSeek V3.1 Terminus.

One email per change. Unsubscribe anytime.

modelpicker.aipowered by live benchmark data

Scores by test

Methodology →
Structured Output
5.0
Strategic Analysis
5.0
Constrained Rewriting
3.0
Creative Problem Solving
4.0
Tool Calling
3.0
Faithfulness
3.0
Classification
3.0
Long Context
5.0
Safety Calibration
1.0
Persona Consistency
4.0
Agentic Planning
4.0
Multilingual
5.0
Tabular Data
5.0

What you need to know

DeepSeek V3.1 Terminus is built for high-complexity analytical tasks and large-scale data processing. It excels in strategic analysis, structured output, and tabular data, making it highly effective for transforming unstructured information into precise formats. Its 164K context window is fully utilized, scoring a perfect 5/5 for long-context performance, which allows it to maintain coherence across extensive documents.

The model is priced aggressively, with a blended cost of $0.645/MTok. Given its top-tier performance in multilingual support and agentic planning, it provides high utility per dollar for developers who need a capable reasoning engine without the cost of premium frontier models.

However, the model has significant reliability gaps. It scores poorly in safety calibration (1/5) and shows mediocre performance in faithfulness and tool calling (3/5). This indicates a higher propensity for hallucinations and a lack of robust guardrails, requiring developers to implement strict external validation and rigorous prompt engineering to ensure output accuracy.

Use this model if you need a low-cost solution for analyzing massive datasets, generating structured reports, or performing complex strategic planning. Skip this model if your application requires high safety standards, strict factual faithfulness, or heavy reliance on autonomous tool calling.

Strengths — Top 3

Structured Output5.0/5.0
Strategic Analysis5.0/5.0
Long Context5.0/5.0

Relative weaknesses — Bottom 3

Safety Calibration1.0/5.0
Constrained Rewriting3.0/5.0
Tool Calling3.0/5.0

Similar models

OOpenAI: gpt-oss-120b$0.1454.08DDeepSeek V3.1$0.6454.00XGrok 4.3$2.194.15QQwen: Qwen3.6 Flash$0.8914.23