deepseek

DeepSeek V3.2

DeepSeek V3.2 is deepseek's high-context AI optimized for retrieval-augmented generation (role: rag) and structured-output workflows. It sits between low-cost inference siblings (DeepSeek V3.1 family) and expensive top-tier bracket peers (Claude Sonnet 4.6, GPT-5.2) by offering an unusually large 163,840-token context window at a low per-token price. In our testing it trades peak multi-task averages for standout abilities in JSON/schema compliance, long-context retrieval, and multilingual fidelity — making it a fit for teams building heavy-RAG apps, document-to-JSON pipelines, or multilingual extraction/transform workloads that need lots of context without high output bills.

Performance

All scores below are from our 12-test suite. Top strengths: (1) Structured output — 5/5 and "tied for 1st with 24 other models" for JSON/schema compliance in our testing; (2) Long-context — 5/5 and "tied for 1st with 36 other models" for retrieval accuracy at 30K+ tokens; (3) Multilingual/faithfulness/persona consistency — all 5/5 (multilingual: tied for 1st with 34 others; faithfulness: tied for 1st with 32 others; persona consistency: tied for 1st with 36 others), meaning V3.2 reliably preserves source material and maintains character across languages. Additional strengths include strategic analysis and agentic planning (both 5/5, tied for 1st on those tasks). Notable weaknesses: tool calling is 3/5 and ranks 47 of 54 (shared with 5 others), so function selection and sequencing are middling in our tests; classification is 3/5 (rank 31 of 53); safety calibration is a relative weakness at 2/5 (rank 12 of 55 in our testing), so it is less conservative on harmful-content refusals than many peers. Overall, DeepSeek V3.2 places 15 of 52 in our overall ranking — strong specialty performance but not the top average scorer across all tasks.

Pricing

DeepSeek V3.2 charges $0.26 per input mtok and $0.38 per output mtok (payload values). Real-world examples: 100k input tokens = $26; 100k output tokens = $38; combined 100k in + 100k out = $64. Scale to 1M in + 1M out = $660; 10M in + 10M out = $6,600. Compared with bracket peers in the payload, V3.2 is dramatically cheaper than Claude Sonnet 4.6 ($15/mtok out) and GPT-5.2 ($14/mtok out), and price-matched with Gemma 4 31B ($0.38/mtok out). It is also less expensive than deepseek's own V3.1 ($0.75/mtok out). For frequent large-context runs, those per-mtok savings compound into substantial monthly cost differences.

deepseek

DeepSeek V3.2

Overall
4.25/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
5/5
Tool Calling
3/5
Classification
3/5
Agentic Planning
5/5
Structured Output
5/5
Safety Calibration
2/5
Strategic Analysis
5/5
Persona Consistency
5/5
Constrained Rewriting
4/5
Creative Problem Solving
4/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Roles

Pricing

Input

$0.260/MTok

Output

$0.380/MTok

Context Window164K

modelpicker.net

Real-World Costs

iChat response<$0.001
iBlog post<$0.001
iDocument batch$0.024
iPipeline run$0.242

Pricing vs Performance

Output cost per million tokens (log scale) vs average score across our 12 internal benchmarks

This modelOther models

Try It

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="YOUR_OPENROUTER_KEY",
)

response = client.chat.completions.create(
    model="deepseek/deepseek-v3.2",
    messages=[
        {"role": "user", "content": "Hello, DeepSeek V3.2!"}
    ],
)

print(response.choices[0].message.content)

Recommendation

Use DeepSeek V3.2 if you need: - Large-context RAG pipelines that ingest and reason across 100k+ token documents (context_window = 163,840 and long context 5/5). - Reliable schema/JSON extraction and format adherence (structured output 5/5 tied for 1st). - Multilingual extraction or translation-aware pipelines where faithfulness matters (multilingual and faithfulness 5/5). Avoid V3.2 for: - Tool-heavy orchestration or multi-step API function sequencing; tool calling is 3/5 and ranks low (47/54). - Safety-critical moderation or content-filtering enforcement where stricter refusal behavior is required (safety calibration 2/5). If you need a similar long-context plus stronger tool orchestration, evaluate other bracket peers; if budget is the dominant constraint, V3.2 offers high-context capability at low per-mtok cost compared with $15/mtok-class competitors.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Frequently Asked Questions