Question 1

Which model is better for contest-style math problems (AIME/olympiad)?

Accepted Answer

Claude Sonnet 4.6 — in our testing Sonnet scores 5/5 on creative_problem_solving vs Haiku’s 4/5, and Sonnet also has an 85.8% AIME 2025 result reported from Epoch AI. Haiku has no external contest math scores in the payload.

Question 2

Are there cases where Claude Haiku 4.5 is preferable?

Accepted Answer

Yes. Choose Claude Haiku 4.5 when cost and latency matter: its input/output costs are 1¢/mTok and 5¢/mTok versus Sonnet’s 3¢/mTok and 15¢/mTok. Haiku ties Sonnet on strategic_analysis (5/5) and structured_output (4/5) in our tests, so it’s a strong budget alternative for many math tasks.

Question 3

Do either model have external benchmark evidence for coding/math tasks?

Accepted Answer

Claude Sonnet 4.6 has external results in the payload: 75.2% on SWE-bench Verified and 85.8% on AIME 2025 (both attributed to Epoch AI). Claude Haiku 4.5 has no external benchmark scores provided in the payload.

Question 4

How big is the safety difference between the models for math prompts?

Accepted Answer

In our testing Sonnet scores 5/5 on safety_calibration while Haiku scores 2/5 — a 3-point gap. That makes Sonnet markedly more reliable on prompts that could trigger refusals, edge-case content, or ambiguous instructions.

Question 5

Do both models support structured outputs and tool calling for math workflows?

Accepted Answer

Yes. In our testing both models score 4/5 for structured_output and 5/5 for tool_calling, indicating they can adhere to output schemas and select/sequence functions accurately for calculator or tool-assisted math flows.

Claude Haiku 4.5 vs Claude Sonnet 4.6 for Math

Claude Haiku 4.5

Claude Sonnet 4.6

Task Analysis

Practical Examples

Bottom Line

How We Test

Frequently Asked Questions