models/deepseek/deepseek-chat-v3-1
D
DeepSeek·active

DeepSeek V3.1

DeepSeek's efficiency model. Context window: 164K tokens.

Overall score
4.00
/5.00 · ranked #56
Input
$0.210
per 1M tokens
Output
$0.790
per 1M tokens
Context
164K
tokens
Blended
$0.645
3:1 out:in ratio

Price drops, new benchmarks, model updates. Stay current on DeepSeek V3.1.

One email per change. Unsubscribe anytime.

modelpicker.aipowered by live benchmark data

Scores by test

Methodology →
Structured Output
5.0
Strategic Analysis
4.0
Constrained Rewriting
3.0
Creative Problem Solving
5.0
Tool Calling
3.0
Faithfulness
5.0
Classification
3.0
Long Context
5.0
Safety Calibration
1.0
Persona Consistency
5.0
Agentic Planning
4.0
Multilingual
4.0
Tabular Data
5.0

What you need to know

DeepSeek V3.1 is a high-precision model optimized for structural integrity and factual reliability. It achieves perfect scores in faithfulness, structured output, and tabular data handling, making it an ideal candidate for data extraction and rigorous formatting tasks. Its ability to maintain persona consistency and handle creative problem solving further distinguishes it as a versatile tool for complex reasoning.

Despite its technical strengths, the model has a critical failure in safety calibration, scoring 1/5. This indicates a high risk of generating unfiltered or unsafe content, requiring developers to implement robust external guardrails. Performance is mediocre in classification and tool calling, suggesting it is less effective as a standalone agent or a routing model.

At a blended cost of $0.600/MTok, the model is priced competitively for its capabilities. While it ranks 46th overall among 71 models, its specific strengths in long context and structured data provide high value for specialized workflows that prioritize output accuracy over general-purpose safety or tool integration.

Use this model if your project requires strict adherence to schemas, high faithfulness in long-context retrieval, or complex tabular data processing. Skip this model if your application requires built-in safety filters, heavy reliance on tool calling, or high-accuracy classification.

Strengths — Top 3

Structured Output5.0/5.0
Creative Problem Solving5.0/5.0
Faithfulness5.0/5.0

Relative weaknesses — Bottom 3

Safety Calibration1.0/5.0
Constrained Rewriting3.0/5.0
Tool Calling3.0/5.0

Similar models

XGrok 4.3$2.194.15OOpenAI: gpt-oss-120b$0.1454.08QQwen: Qwen3.6 Flash$0.8914.23GGemini 2.5 Pro$7.814.23