Claude Haiku 4.5 vs Devstral Medium for Structured Output
Tie — Claude Haiku 4.5 and Devstral Medium both score 4/5 on Structured Output in our testing and share rank 26 of 52. Neither model outscored the other on the structured_output test itself. Choose between them based on tradeoffs: Claude Haiku 4.5 provides stronger supporting capabilities (tool_calling 5 vs 3, long_context 5 vs 4, faithfulness 5 vs 4) and multimodal input, while Devstral Medium is materially cheaper (output cost $2/mtok vs Haiku $5/mtok) with the same structured_output score.
anthropic
Claude Haiku 4.5
Benchmark Scores
External Benchmarks
Pricing
Input
$1.00/MTok
Output
$5.00/MTok
modelpicker.net
mistral
Devstral Medium
Benchmark Scores
External Benchmarks
Pricing
Input
$0.400/MTok
Output
$2.00/MTok
modelpicker.net
Task Analysis
Structured Output demands precise JSON schema compliance and strict format adherence (our structured_output benchmark definition). In our testing both models achieved a 4/5 on that task, so the primary structured-output capability is equivalent. The differentiating capabilities for reliably producing and validating structured output are: tool calling and function-argument accuracy (helps assemble arguments and invoke validators), long-context handling (keeps large schemas and examples in context), faithfulness (avoids hallucinated fields), modality support (image→text when extracting structured data from images), and cost/throughput for production use. In our scores Claude Haiku 4.5 outperforms Devstral Medium on tool_calling (5 vs 3), long_context (5 vs 4), and faithfulness (5 vs 4), which explain why Haiku is likely to be more robust on complex, stateful, or multimodal schema tasks. Devstral Medium matches Haiku on the structured_output metric itself (4/5) and on classification, but offers lower per-token cost (input 0.4 vs 1 and output 2 vs 5 per mTok) and a large 131,072 token window, which is sufficient for many schema tasks.
Practical Examples
- Complex nested JSON with validator calls: Claude Haiku 4.5 is preferable because tool_calling=5 and long_context=5 help it select functions and keep large schema examples in context. Expect fewer manual fixes when orchestrating validation steps. 2) High-volume templated JSON generation (API responses, logs): Devstral Medium is attractive because it ties Haiku on structured_output (4/5) but has lower output cost ($2/mtok vs $5/mtok), yielding ~2.5x cheaper outputs per token in bulk. 3) Image-to-structured-data extraction (receipts, forms): Claude Haiku 4.5 supports text+image→text modality, making it a better fit for multimodal extraction where schema fidelity matters. 4) Small schemas and single-shot responses: Devstral Medium (structured_output 4/5, long_context 4) is a cost-efficient choice when tool calling or multimodal input is not required. All examples reflect our internal scores (structured_output 4/5 each; supporting scores cited above).
Bottom Line
For Structured Output, choose Claude Haiku 4.5 if you need stronger tool-calling, larger context capacity, higher faithfulness, or image→text extraction and can accept higher cost. Choose Devstral Medium if you need the same structured_output quality at significantly lower token cost and your workflows don’t require advanced tool orchestration or multimodal input.
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.