Claude Sonnet 4.6 vs GPT-5
For most teams balancing capability and cost, GPT-5 is the pragmatic choice: it delivers top structured-output and math performance at lower per-token prices. Choose Claude Sonnet 4.6 when safety calibration and creative problem-solving matter most despite a 1.5× price premium.
anthropic
Claude Sonnet 4.6
Benchmark Scores
External Benchmarks
Pricing
Input
$3.00/MTok
Output
$15.00/MTok
modelpicker.net
openai
GPT-5
Benchmark Scores
External Benchmarks
Pricing
Input
$1.25/MTok
Output
$10.00/MTok
modelpicker.net
Benchmark Analysis
Across our 12-test suite the matchup is largely tied: they tie on 8 tests (strategic_analysis, tool_calling, faithfulness, classification, long_context, persona_consistency, agentic_planning, multilingual). Claude Sonnet 4.6 wins creative_problem_solving (5 vs 4) and safety_calibration (5 vs 2) in our testing — safety_calibration is a major differentiator (Sonnet rank: tied for 1st of 55, "tied for 1st with 4 others"; GPT-5 rank: 12 of 55). GPT-5 wins structured_output (5 vs 4) and constrained_rewriting (4 vs 3) — structured_output is a top result for GPT-5 (ranked tied for 1st of 54). Practical implications: Sonnet's 5/5 safety_calibration means it better refuses harmful requests and permits legitimate ones in our tests, and its 5/5 creative_problem_solving yields more non-obvious, feasible ideas. GPT-5's 5/5 structured_output and higher constrained_rewriting score mean it adheres to schemas and compresses text into tight limits more reliably in our tests. On external benchmarks: Sonnet scores 75.2% on SWE-bench Verified (Epoch AI) and ranks 4 of 12 on that test; GPT-5 scores 73.6% on SWE-bench Verified (Epoch AI) and ranks 6 of 12. For advanced math, GPT-5 posts 98.1% on MATH Level 5 (Epoch AI) and ranks 1 of 14, and scores 91.4% on AIME 2025 (Epoch AI) vs Sonnet's 85.8% on AIME 2025 (Epoch AI). These external measures corroborate GPT-5's stronger raw math performance and Sonnet's edge on our safety and creative benchmarks.
Pricing Analysis
Prices from the payload: Claude Sonnet 4.6 charges $3.00 input + $15.00 output per mTok (combined $18.00/mTok). GPT-5 charges $1.25 input + $10.00 output per mTok (combined $11.25/mTok). Assuming the payload unit 'mTok' = 1 million tokens, monthly cost examples: 1M tokens -> Sonnet $18.00 vs GPT-5 $11.25; 10M -> Sonnet $180 vs GPT-5 $112.50; 100M -> Sonnet $1,800 vs GPT-5 $1,125. The ~60% price gap (priceRatio 1.5) matters for high-volume SaaS, search, or large-batch inference: at 100M tokens/month the difference is $675. Lower-volume teams or safety/creative-first workflows may accept the Sonnet premium; cost-sensitive product teams should prefer GPT-5 for regular inference and large-scale deployments.
Real-World Cost Comparison
Bottom Line
Choose Claude Sonnet 4.6 if: you prioritize safety calibration, iterative/agentic workflows, or creative problem-solving and are willing to pay ~1.5× per token. Example: moderated chatbots, safety-first internal tools, or projects requiring strong refusal behavior and generative ideation. Choose GPT-5 if: you need lower per-token cost, best-in-class structured-output and constrained-rewriting, or top-tier competition math performance. Example: high-volume API products, schema-driven pipelines, automated document formatting, or math-heavy applications.
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.