Claude Opus 4.6
Anthropic's flagship model. Long-context specialist with 1M window.
Scores by test
Methodology →What you need to know
Claude Opus 4.6 is a high-reasoning model optimized for complex agentic workflows and long-context processing. It demonstrates exceptional proficiency in tool calling, agentic planning, and strategic analysis, all scoring 5/5 internally. Its external performance is particularly strong in technical domains, achieving a 78.7% score on SWE-bench Verified and 94.4% on AIME 2025, indicating a high capacity for software engineering and mathematical reasoning.
The model is positioned at a premium price point, with a blended cost of $20.00/MTok. While expensive, this cost aligns with its rank as the 4th strongest model out of 71. The 1M token context window is fully utilized, as the model maintains a 5/5 internal score for long-context performance and faithfulness.
Despite its reasoning capabilities, the model struggles with rigid formatting and categorization tasks. It scores only 3/5 in classification and constrained rewriting, and 4/5 in structured output. This suggests a tendency to deviate from strict templates or narrow labeling requirements.
Use this model for autonomous agents, complex codebase analysis, and high-stakes strategic planning where reasoning quality outweighs cost. Skip this model for high-volume classification tasks or applications requiring strict adherence to constrained rewriting formats.
Strengths — Top 3
Relative weaknesses — Bottom 3
Similar models