deepseek
DeepSeek V3.1
DeepSeek V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. At $0.15 input / $0.75 output per million tokens, it is one of the most affordable production models in our tested field. In our 12-benchmark suite, it ranked 31st out of 52 models with an average score of 3.92 — delivering top-tier results on creative problem solving, faithfulness, structured output, long context, and persona consistency. The 32,768-token context window is relatively small compared to peers. It supports tool calling, structured outputs, and reasoning parameters.
Performance
DeepSeek V3.1's top benchmark scores in our testing include: creative problem solving (5/5, tied for 1st with 7 other models out of 54 tested), faithfulness (5/5, tied for 1st with 32 other models out of 55 tested), structured output (5/5, tied for 1st with 24 other models out of 54 tested), persona consistency (5/5, tied for 1st with 36 other models out of 53 tested), and long context (5/5, tied for 1st with 36 other models out of 55 tested). Notable weaknesses: tool calling scored 3/5 (rank 47 of 54) and safety calibration scored 1/5 (rank 32 of 55). The tool calling weakness is significant for agentic workflows that rely on reliable function invocation. Overall rank: 31 out of 52 tested models.
Pricing
DeepSeek V3.1 costs $0.15 per million input tokens and $0.75 per million output tokens — among the lowest in the tested model pool (range: $0.10–$25 output). At 1 million output tokens/month, that is $0.75; at 10 million output tokens, $7.50. Within the deepseek lineup, it undercuts R1 ($2.50/MTok output, avg 4.0) while scoring similarly, and is significantly cheaper than R1 0528 ($2.15/MTok output, avg 4.5). It is priced nearly identically to DeepSeek V3.1 Terminus ($0.79/MTok) but outscores it on our benchmarks (3.92 vs. 3.75 avg). For developers prioritizing cost-per-quality, DeepSeek V3.1 is one of the strongest value options in the tested field.
deepseek
DeepSeek V3.1
Benchmark Scores
External Benchmarks
Pricing
Input
$0.150/MTok
Output
$0.750/MTok
modelpicker.net
Real-World Costs
Pricing vs Performance
Output cost per million tokens (log scale) vs average score across our 12 internal benchmarks
Try It
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="YOUR_OPENROUTER_KEY",
)
response = client.chat.completions.create(
model="deepseek/deepseek-chat-v3.1",
messages=[
{"role": "user", "content": "Hello, DeepSeek V3.1!"}
],
)
print(response.choices[0].message.content)Recommendation
DeepSeek V3.1 is an excellent value pick for developers and teams who need strong creative, faithful, and structured output at the lowest price point in the field. At $0.75/MTok output with 5/5 on faithfulness, creative problem solving, structured output, and persona consistency, it competes with models costing 3–10x more on those specific dimensions. Avoid it for agentic workflows that depend on tool calling — it scored 3/5 (rank 47 of 54), which means function invocation reliability is below the field median. The 32,768-token context window also limits it for long-document tasks; models with larger contexts are better suited for document-level retrieval work.
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.