deepseek
DeepSeek V3.1 Terminus
DeepSeek V3.1 Terminus is an updated release that builds on DeepSeek V3.1, specifically targeting reported issues with language consistency and agent capabilities. At $0.21 input / $0.79 output per million tokens, it is one of the most affordable models in our test pool — priced below DeepSeek V3.2 ($0.38 output) while occupying a different release track. In our testing, it ranked 36th out of 52 models with an average score of 3.75, delivering strong strategic analysis, multilingual output, and structured data formatting. It supports tool calling, structured outputs, and reasoning parameters.
Performance
DeepSeek V3.1 Terminus's top three benchmark scores in our testing are strategic analysis (5/5, tied for 1st with 25 other models out of 54 tested), multilingual quality (5/5, tied for 1st with 34 other models out of 55 tested), and structured output (5/5, tied for 1st with 24 other models out of 54 tested). Long context also scored 5/5. Notable weaknesses: faithfulness scored 3/5 (rank 52 of 55 — near the bottom), tool calling scored 3/5 (rank 47 of 54), and safety calibration scored 1/5 (rank 32 of 55). The faithfulness weakness means it is not well-suited for strict RAG applications where hallucination is unacceptable. Tool calling at 3/5 also limits its reliability for agentic function-invocation pipelines. Overall rank: 36 out of 52 tested models.
Pricing
DeepSeek V3.1 Terminus costs $0.21 per million input tokens and $0.79 per million output tokens — at the very low end of the tested model pool (range: $0.10–$25 output). At 1 million output tokens/month, that is $0.79; at 10 million output tokens, $7.90. Within the deepseek lineup, it sits at nearly the same price as DeepSeek V3.1 ($0.75 output, avg 3.92) while delivering a different release variant, and well below R1 0528 ($2.15 output, avg 4.5). For workflows that need strong multilingual and strategic reasoning at the lowest possible cost, the price-to-performance ratio is competitive.
deepseek
DeepSeek V3.1 Terminus
Benchmark Scores
External Benchmarks
Pricing
Input
$0.210/MTok
Output
$0.790/MTok
modelpicker.net
Real-World Costs
Pricing vs Performance
Output cost per million tokens (log scale) vs average score across our 12 internal benchmarks
Try It
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="YOUR_OPENROUTER_KEY",
)
response = client.chat.completions.create(
model="deepseek/deepseek-v3.1-terminus",
messages=[
{"role": "user", "content": "Hello, DeepSeek V3.1 Terminus!"}
],
)
print(response.choices[0].message.content)Recommendation
DeepSeek V3.1 Terminus is a strong fit for budget-conscious teams running multilingual content pipelines, strategic analysis tasks, or structured data extraction — particularly those where $0.79/MTok output is a meaningful constraint. The 5/5 scores on strategic analysis, multilingual, and structured output at under $1/MTok output make it compelling for high-volume batch workloads. Avoid it for RAG applications — faithfulness scored 3/5, ranking near the bottom of the field. Also avoid for complex agentic workflows requiring reliable tool calling (3/5, rank 47 of 54). For those use cases, DeepSeek V3.2 ($0.38/MTok, avg 4.25) delivers better benchmark coverage at a similar price.
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.