models/anthropic/claude-sonnet-4-6
A
Anthropic·active

Claude Sonnet 4.6

Anthropic's flagship model. Long-context specialist with 1M window.

Overall score
4.69
/5.00 · ranked #2
Input
$3.00
per 1M tokens
Output
$15.00
per 1M tokens
Context
1M
tokens
Blended
$12.00
3:1 out:in ratio

Price drops, new benchmarks, model updates. Stay current on Claude Sonnet 4.6.

One email per change. Unsubscribe anytime.

modelpicker.aipowered by live benchmark data

Scores by test

Methodology →
Structured Output
4.0
Strategic Analysis
5.0
Constrained Rewriting
3.0
Creative Problem Solving
5.0
Tool Calling
5.0
Faithfulness
5.0
Classification
4.0
Long Context
5.0
Safety Calibration
5.0
Persona Consistency
5.0
Agentic Planning
5.0
Multilingual
5.0
Tabular Data
5.0
SWE-bench Verified
75.2
AIME 2025
85.8

What you need to know

Claude Sonnet 4.6 is a top-tier generalist model ranking second out of 71 evaluated models, distinguished primarily by its reliability in complex reasoning and autonomous tasks. It achieves perfect internal scores in agentic planning, tool calling, and faithfulness, complemented by a strong 75.2% on SWE-bench Verified. These metrics indicate a model capable of high-autonomy software engineering and strategic analysis with minimal hallucination.

The model supports a massive 1M token context window and maintains perfect performance scores across long-context and multilingual tasks. While it excels at high-level problem solving, it shows relative weakness in constrained rewriting and basic classification. Developers should expect lower precision when enforcing strict formatting constraints or performing simple categorical labeling compared to its performance in strategic reasoning.

At a blended cost of $12.00 per million tokens, this model sits in a premium price tier. However, the cost is justified by its versatility and high average internal score of 4.69/5.0, positioning it as a high-efficiency tool for complex workflows rather than a cheap option for simple API calls.

Use this model for agentic workflows, complex coding tasks, and large-document analysis where faithfulness is critical. Skip this model for high-volume, low-complexity classification tasks or projects requiring strict adherence to rigid rewriting constraints.

Strengths — Top 3

Strategic Analysis5.0/5.0
Creative Problem Solving5.0/5.0
Tool Calling5.0/5.0

Relative weaknesses — Bottom 3

Constrained Rewriting3.0/5.0
Structured Output4.0/5.0
Classification4.0/5.0

Similar models

ZGLM-4.7$1.414.69XXiaomi: MiMo-V2-Pro$2.504.54QQwen: Qwen3.6 Max Preview$4.944.85OGPT-5.2$10.944.69