Claude Haiku 4.5 vs Devstral Medium

In our testing Claude Haiku 4.5 is the clear winner for most production use cases, taking 9 of 12 benchmark categories (tool calling, long-context, faithfulness, agentic planning). Devstral Medium ties or matches Haiku on structured output and classification but provides a lower-cost option—Haiku costs 2.5× more per token in the payload data, so choose Devstral if unit cost is the primary constraint.

anthropic

Claude Haiku 4.5

Overall
4.33/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
5/5
Tool Calling
5/5
Classification
4/5
Agentic Planning
5/5
Structured Output
4/5
Safety Calibration
2/5
Strategic Analysis
5/5
Persona Consistency
5/5
Constrained Rewriting
3/5
Creative Problem Solving
4/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$1.00/MTok

Output

$5.00/MTok

Context Window200K

modelpicker.net

mistral

Devstral Medium

Overall
3.17/5Usable

Benchmark Scores

Faithfulness
4/5
Long Context
4/5
Multilingual
4/5
Tool Calling
3/5
Classification
4/5
Agentic Planning
4/5
Structured Output
4/5
Safety Calibration
1/5
Strategic Analysis
2/5
Persona Consistency
3/5
Constrained Rewriting
3/5
Creative Problem Solving
2/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$0.400/MTok

Output

$2.00/MTok

Context Window131K

modelpicker.net

Benchmark Analysis

Across our 12-test suite Haiku wins 9 categories, Devstral wins 0, and 3 are ties. Breakdown (Haiku score vs Devstral score, with ranking context and task meaning):

  • Strategic analysis: 5 vs 2 — Haiku tied for 1st of 54 (tied with 25 others). This means better nuanced tradeoff reasoning with numbers in planning/finance scenarios.
  • Creative problem solving: 4 vs 2 — Haiku ranks 9 of 54. Expect more specific, feasible ideas from Haiku.
  • Tool calling: 5 vs 3 — Haiku tied for 1st of 54. Haiku selects functions, arguments, and sequencing more accurately for agentic workflows and tool integrations.
  • Faithfulness: 5 vs 4 — Haiku tied for 1st of 55. Haiku is less likely to hallucinate and sticks closer to source material for factual tasks.
  • Long context: 5 vs 4 — Haiku tied for 1st of 55. Better retrieval and coherence when working with 30K+ token documents.
  • Safety calibration: 2 vs 1 — Haiku (rank 12/55) refuses harmful requests more appropriately than Devstral (rank 32/55), though neither scores high in absolute terms.
  • Persona consistency: 5 vs 3 — Haiku tied for 1st of 53. Stronger at maintaining character and resisting prompt injection.
  • Agentic planning: 5 vs 4 — Haiku tied for 1st of 54. Better at goal decomposition and recovery in multi-step agents.
  • Multilingual: 5 vs 4 — Haiku tied for 1st of 55. Higher-quality non-English outputs in our tests. Ties (no clear winner): structured_output 4/4 — both rank mid (26/54/53) for JSON schema adherence; constrained_rewriting 3/3 — both handle compression within limits equally; classification 4/4 — both tied for 1st of 53, so routing/categorization performance is comparable. Practical meaning: Haiku is superior for agent-driven apps, long-document work, and safety/fidelity-sensitive tasks. Devstral does not win any benchmark in our suite but matches Haiku on structured output and classification while offering a lower unit price.
BenchmarkClaude Haiku 4.5Devstral Medium
Faithfulness5/54/5
Long Context5/54/5
Multilingual5/54/5
Tool Calling5/53/5
Classification4/54/5
Agentic Planning5/54/5
Structured Output4/54/5
Safety Calibration2/51/5
Strategic Analysis5/52/5
Persona Consistency5/53/5
Constrained Rewriting3/53/5
Creative Problem Solving4/52/5
Summary9 wins0 wins

Pricing Analysis

The payload lists input/output prices per mTok: Claude Haiku 4.5 is $1 input + $5 output = $6 per mTok; Devstral Medium is $0.4 input + $2 output = $2.4 per mTok. Interpreting mTok as the billing unit in the payload, that scales to: 1M tokens/month = 1,000 mTok → Haiku $6,000 vs Devstral $2,400; 10M tokens = Haiku $60,000 vs Devstral $24,000; 100M tokens = Haiku $600,000 vs Devstral $240,000. The price ratio in the payload is 2.5×. Who should care: high-volume products, SaaS startups, and cost-optimized pipelines will feel the difference immediately (tens to hundreds of thousands per month at scale). Teams prioritizing top-tier tool calling, long-context handling, multilingual fidelity, or agentic workflows may justify Haiku’s premium; cost-sensitive bulk processing or prototyping favors Devstral Medium.

Real-World Cost Comparison

TaskClaude Haiku 4.5Devstral Medium
iChat response$0.0027$0.0011
iBlog post$0.011$0.0042
iDocument batch$0.270$0.108
iPipeline run$2.70$1.08

Bottom Line

Choose Claude Haiku 4.5 if you need best-in-class tool calling, long-context handling, agentic planning, multilingual fidelity, or higher faithfulness—Haiku won 9 of 12 categories in our testing. Choose Devstral Medium if you need a lower-cost engine for high-volume classification or schema-driven output, or if budget at scale (2.5× cheaper per token in the payload) is the primary constraint. Note: Devstral’s provider description positions it for code and agentic reasoning, but on our 12-test bench Haiku outperforms it across most measured dimensions.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Frequently Asked Questions