Claude Haiku 4.5 vs Devstral Small 1.1 for Multilingual

Winner: Claude Haiku 4.5. In our testing Claude Haiku 4.5 scores 5/5 on Multilingual vs Devstral Small 1.1's 4/5 (task rank 1 of 52 vs 36 of 52). Haiku's higher multilingual score is supported by top marks in long_context (5 vs 4), faithfulness (5 vs 4), persona_consistency (5 vs 2) and tool_calling (5 vs 4). Devstral Small 1.1 is substantially cheaper per token (input 0.1 vs 1, output 0.3 vs 5 per mTok) and still delivers solid multilingual output (4/5), but for equivalent non‑English quality and robust context handling Haiku is the clearer choice.

anthropic

Claude Haiku 4.5

Overall
4.33/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
5/5
Tool Calling
5/5
Classification
4/5
Agentic Planning
5/5
Structured Output
4/5
Safety Calibration
2/5
Strategic Analysis
5/5
Persona Consistency
5/5
Constrained Rewriting
3/5
Creative Problem Solving
4/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$1.00/MTok

Output

$5.00/MTok

Context Window200K

modelpicker.net

mistral

Devstral Small 1.1

Overall
3.08/5Usable

Benchmark Scores

Faithfulness
4/5
Long Context
4/5
Multilingual
4/5
Tool Calling
4/5
Classification
4/5
Agentic Planning
2/5
Structured Output
4/5
Safety Calibration
2/5
Strategic Analysis
2/5
Persona Consistency
2/5
Constrained Rewriting
3/5
Creative Problem Solving
2/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$0.100/MTok

Output

$0.300/MTok

Context Window131K

modelpicker.net

Task Analysis

Multilingual demands producing equivalent-quality output across non-English languages: accurate grammar, idiomatic phrasing, consistent persona, and robustness across long documents or multimodal inputs. Our primary signal here is the internal Multilingual test: Claude Haiku 4.5 scored 5/5 and Devstral Small 1.1 scored 4/5. Supporting benchmarks from our suite show Haiku's strengths that matter for multilingual tasks: long_context 5 vs 4 (helps large translated documents or multi-turn conversations), faithfulness 5 vs 4 (reduces hallucinated translations), and persona_consistency 5 vs 2 (keeps tone and register across languages). Haiku additionally supports text+image->text modality, useful when multilingual tasks include images; Devstral is text->text only. There is no external third‑party multilingual benchmark in the payload, so the winner is based on our internal task scores and related proxy metrics.

Practical Examples

Where Claude Haiku 4.5 shines: - Translating and editing a 100k‑token technical manual into multiple languages while preserving tone and citations (long_context 5, faithfulness 5). - Multilingual customer support flows that must keep persona and register consistent across languages (persona_consistency 5). - Image‑embedded multilingual extraction (Haiku’s text+image->text modality). Where Devstral Small 1.1 shines: - Cost-sensitive batch translations or classification across several languages where perfect parity with English is not required (Multilingual 4/5) and you want lower token costs (input 0.1 vs 1, output 0.3 vs 5 per mTok). - Tasks that need structured outputs or classification at scale — Devstral ties with Haiku on structured_output and classification (both score 4).

Bottom Line

For Multilingual, choose Claude Haiku 4.5 if you need the highest non‑English output quality, reliable long‑context handling, and multimodal (image) support — it scores 5 vs 4 in our testing. Choose Devstral Small 1.1 if you prioritize lower token cost (input 0.1 vs 1, output 0.3 vs 5 per mTok) and still-good multilingual performance for large-volume or budgeted workflows.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Frequently Asked Questions