anthropic

Claude Haiku 4.5

Claude Haiku 4.5 is Anthropic's fastest and most efficient model in our dataset, described as delivering near-frontier intelligence at a fraction of the cost and latency of larger, pricier alternatives. It ranked 10th overall out of 52 tested models — the same rank as Grok 4.20 and GPT-5.4 Mini — making it one of the highest-scoring models at its price point. At $1/M input and $5/M output, it undercuts GPT-5 ($10/M output), Claude Sonnet 4.6 ($15/M output), and o3 ($8/M output) by a wide margin while delivering a competitive average benchmark score. It supports a 200K context window with up to 64K output tokens. No quirks are documented in our payload.

Performance

In our 12-test benchmark suite, Claude Haiku 4.5's top strengths are agentic planning (5/5, tied for 1st with 14 others out of 54 tested), tool calling (5/5, tied for 1st with 16 others out of 54 tested), and faithfulness (5/5, tied for 1st with 32 others out of 55 tested). It also scores 5/5 on strategic analysis, multilingual, long context, and persona consistency — a broad set of top scores across diverse capability dimensions. Safety calibration scored 2/5 (rank 12 of 55) — above the 1/5 scores seen in several other models but still below the median. Constrained rewriting is the clearest weakness at 3/5 (rank 31 of 53). Claude Haiku 4.5 has no external benchmark data (SWE-bench, MATH, AIME) in our payload.

Pricing

Claude Haiku 4.5 costs $1 per million input tokens and $5 per million output tokens. At 1M input + 500K output, total cost is about $3.50. At 10M input / 5M output per month, expect $35/month. At 100M input / 50M output, roughly $350/month. This makes it one of the most affordable models in the top-10 overall ranking. Bracket peers at similar performance: Grok 4.20 costs $6/M output (20% more), GPT-5.4 Mini costs $4.5/M output (10% less), and o4 Mini costs $4.4/M output (12% less). For teams prioritizing per-token cost while maintaining high benchmark scores, Claude Haiku 4.5 occupies an unusually strong value position — few models ranked this high cost this little.

anthropic

Claude Haiku 4.5

Overall
4.33/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
5/5
Tool Calling
5/5
Classification
4/5
Agentic Planning
5/5
Structured Output
4/5
Safety Calibration
2/5
Strategic Analysis
5/5
Persona Consistency
5/5
Constrained Rewriting
3/5
Creative Problem Solving
4/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$1.00/MTok

Output

$5.00/MTok

Context Window200K

modelpicker.net

Real-World Costs

iChat response$0.0027
iBlog post$0.011
iDocument batch$0.270
iPipeline run$2.70

Pricing vs Performance

Output cost per million tokens (log scale) vs average score across our 12 internal benchmarks

This modelOther models

Try It

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="YOUR_OPENROUTER_KEY",
)

response = client.chat.completions.create(
    model="anthropic/claude-haiku-4.5",
    messages=[
        {"role": "user", "content": "Hello, Claude Haiku 4.5!"}
    ],
)

print(response.choices[0].message.content)

Recommendation

Claude Haiku 4.5 is the strongest value pick for teams that need top-bracket performance at moderate cost. Its 5/5 agentic planning and tool calling scores in our testing make it an excellent foundation for agentic pipelines, while its 5/5 faithfulness ensures reliability in summarization and document analysis. The $5/M output price is considerably lower than Claude Sonnet 4.6 ($15/M output) while ranking at the same overall position as much pricier models. It is particularly well-suited for high-volume applications — customer support automation, multilingual content pipelines, classification at scale — where cost efficiency directly impacts margins. Who should look elsewhere: if constrained rewriting or compression tasks are primary use cases, Claude Haiku 4.5's 3/5 score (rank 31 of 53) means other models in our testing outperform it there. If math or coding benchmark verification matters, note that Claude Haiku 4.5 lacks external benchmark data in our payload — models like GPT-5 or Gemini 3 Flash Preview have verified external scores from Epoch AI.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.