Claude Sonnet 4.6
Anthropic's flagship model. Long-context specialist with 1M window.
Scores by test
Methodology →What you need to know
Claude Sonnet 4.6 is a top-tier generalist model ranking second out of 71 evaluated models, distinguished primarily by its reliability in complex reasoning and autonomous tasks. It achieves perfect internal scores in agentic planning, tool calling, and faithfulness, complemented by a strong 75.2% on SWE-bench Verified. These metrics indicate a model capable of high-autonomy software engineering and strategic analysis with minimal hallucination.
The model supports a massive 1M token context window and maintains perfect performance scores across long-context and multilingual tasks. While it excels at high-level problem solving, it shows relative weakness in constrained rewriting and basic classification. Developers should expect lower precision when enforcing strict formatting constraints or performing simple categorical labeling compared to its performance in strategic reasoning.
At a blended cost of $12.00 per million tokens, this model sits in a premium price tier. However, the cost is justified by its versatility and high average internal score of 4.69/5.0, positioning it as a high-efficiency tool for complex workflows rather than a cheap option for simple API calls.
Use this model for agentic workflows, complex coding tasks, and large-document analysis where faithfulness is critical. Skip this model for high-volume, low-complexity classification tasks or projects requiring strict adherence to rigid rewriting constraints.
Strengths — Top 3
Relative weaknesses — Bottom 3
Similar models