Claude Haiku 4.5 vs Devstral Small 1.1 for Structured Output

No single winner — Claude Haiku 4.5 and Devstral Small 1.1 tie for Structured Output in our testing (both 4/5, rank 26 of 52). Choose Haiku when strict schema fidelity across very long or multimodal contexts and stronger tool-calling matter; choose Devstral when you need the same structured-output quality at far lower cost (Haiku output_cost_per_mtok = 5 vs Devstral output_cost_per_mtok = 0.3, a 16.7× difference).

anthropic

Claude Haiku 4.5

Overall
4.33/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
5/5
Tool Calling
5/5
Classification
4/5
Agentic Planning
5/5
Structured Output
4/5
Safety Calibration
2/5
Strategic Analysis
5/5
Persona Consistency
5/5
Constrained Rewriting
3/5
Creative Problem Solving
4/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$1.00/MTok

Output

$5.00/MTok

Context Window200K

modelpicker.net

mistral

Devstral Small 1.1

Overall
3.08/5Usable

Benchmark Scores

Faithfulness
4/5
Long Context
4/5
Multilingual
4/5
Tool Calling
4/5
Classification
4/5
Agentic Planning
2/5
Structured Output
4/5
Safety Calibration
2/5
Strategic Analysis
2/5
Persona Consistency
2/5
Constrained Rewriting
3/5
Creative Problem Solving
2/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$0.100/MTok

Output

$0.300/MTok

Context Window131K

modelpicker.net

Task Analysis

Structured Output in our suite means JSON schema compliance and format adherence. Key capabilities: exact response_format/structured_outputs support, adherence to field types and required keys, deterministic formatting, long-context handling when schemas span many tokens, and reliable tool calling when outputs depend on external data. Both models explicitly support structured_outputs and scored 4/5 on our structured_output test, placing them tied at rank 26 of 52. Supporting proxy metrics: Claude Haiku 4.5 surpasses Devstral Small 1.1 on tool_calling (5 vs 4), long_context (5 vs 4), and faithfulness (5 vs 4) — attributes that reduce schema drift for large, multi-step generation. Devstral Small 1.1 matches Haiku on the core structured_output score (4/5) while being text-only and much cheaper (input_cost_per_mtok 0.1 vs 1; output_cost_per_mtok 0.3 vs 5). With no third-party external benchmark present in the payload, our internal structured_output score is the primary signal for this task.

Practical Examples

  1. Large multimodal invoice extraction and JSON generation: Claude Haiku 4.5 (context_window 200000, long_context 5/5, modality text+image->text) is better when the schema must integrate OCR results and keep 50k+ token context while preserving exact keys and types. 2) High-throughput API that returns compact JSON responses for millions of requests: Devstral Small 1.1 is preferable because it delivers the same structured_output score (4/5) with far lower cost (output_cost_per_mtok 0.3 vs 5), reducing per-response spend by ~16.7×. 3) Tool-driven pipelines where the model selects functions and formats their outputs into a strict schema: Claude Haiku 4.5 (tool_calling 5/5) is likelier to choose correct functions and format arguments reliably. 4) Simple text-only schema generation (forms, webhooks, short responses): Devstral Small 1.1 matches structured compliance while saving compute and cost.

Bottom Line

For Structured Output, choose Claude Haiku 4.5 if you need multimodal input, very large context (200k tokens), stronger tool-calling (5 vs 4) or higher faithfulness for complex schemas and you can accept higher cost. Choose Devstral Small 1.1 if you need the same core structured-output quality (both score 4/5 in our tests) for text-only workloads and want far lower output cost (0.3 vs 5 per mTok).

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Frequently Asked Questions