deepseek

DeepSeek V3.1 Terminus

DeepSeek V3.1 Terminus is an updated release that builds on DeepSeek V3.1, specifically targeting reported issues with language consistency and agent capabilities. At $0.21 input / $0.79 output per million tokens, it is one of the most affordable models in our test pool — priced below DeepSeek V3.2 ($0.38 output) while occupying a different release track. In our testing, it ranked 36th out of 52 models with an average score of 3.75, delivering strong strategic analysis, multilingual output, and structured data formatting. It supports tool calling, structured outputs, and reasoning parameters.

Performance

DeepSeek V3.1 Terminus's top three benchmark scores in our testing are strategic analysis (5/5, tied for 1st with 25 other models out of 54 tested), multilingual quality (5/5, tied for 1st with 34 other models out of 55 tested), and structured output (5/5, tied for 1st with 24 other models out of 54 tested). Long context also scored 5/5. Notable weaknesses: faithfulness scored 3/5 (rank 52 of 55 — near the bottom), tool calling scored 3/5 (rank 47 of 54), and safety calibration scored 1/5 (rank 32 of 55). The faithfulness weakness means it is not well-suited for strict RAG applications where hallucination is unacceptable. Tool calling at 3/5 also limits its reliability for agentic function-invocation pipelines. Overall rank: 36 out of 52 tested models.

Pricing

DeepSeek V3.1 Terminus costs $0.21 per million input tokens and $0.79 per million output tokens — at the very low end of the tested model pool (range: $0.10–$25 output). At 1 million output tokens/month, that is $0.79; at 10 million output tokens, $7.90. Within the deepseek lineup, it sits at nearly the same price as DeepSeek V3.1 ($0.75 output, avg 3.92) while delivering a different release variant, and well below R1 0528 ($2.15 output, avg 4.5). For workflows that need strong multilingual and strategic reasoning at the lowest possible cost, the price-to-performance ratio is competitive.

deepseek

DeepSeek V3.1 Terminus

Overall
3.75/5Strong

Benchmark Scores

Faithfulness
3/5
Long Context
5/5
Multilingual
5/5
Tool Calling
3/5
Classification
3/5
Agentic Planning
4/5
Structured Output
5/5
Safety Calibration
1/5
Strategic Analysis
5/5
Persona Consistency
4/5
Constrained Rewriting
3/5
Creative Problem Solving
4/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$0.210/MTok

Output

$0.790/MTok

Context Window164K

modelpicker.net

Real-World Costs

iChat response<$0.001
iBlog post$0.0017
iDocument batch$0.044
iPipeline run$0.437

Pricing vs Performance

Output cost per million tokens (log scale) vs average score across our 12 internal benchmarks

This modelOther models

Try It

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="YOUR_OPENROUTER_KEY",
)

response = client.chat.completions.create(
    model="deepseek/deepseek-v3.1-terminus",
    messages=[
        {"role": "user", "content": "Hello, DeepSeek V3.1 Terminus!"}
    ],
)

print(response.choices[0].message.content)

Recommendation

DeepSeek V3.1 Terminus is a strong fit for budget-conscious teams running multilingual content pipelines, strategic analysis tasks, or structured data extraction — particularly those where $0.79/MTok output is a meaningful constraint. The 5/5 scores on strategic analysis, multilingual, and structured output at under $1/MTok output make it compelling for high-volume batch workloads. Avoid it for RAG applications — faithfulness scored 3/5, ranking near the bottom of the field. Also avoid for complex agentic workflows requiring reliable tool calling (3/5, rank 47 of 54). For those use cases, DeepSeek V3.2 ($0.38/MTok, avg 4.25) delivers better benchmark coverage at a similar price.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Frequently Asked Questions