Question 1

How large is the performance gap for Research between the two models in our tests?

Accepted Answer

In our testing Claude Haiku 4.5 scores 5.0 for the Research task vs Devstral Small 1.1's 3.33 — a gap of 1.67 points. That difference is driven by strategic_analysis (5 vs 2), faithfulness (5 vs 4), and long_context (5 vs 4).

Question 2

Does cost favor Devstral Small 1.1 for research workflows?

Accepted Answer

Yes. Devstral’s input/output costs are 0.1 / 0.3 per mTok versus Haiku’s 1 / 5. Output cost per mTok is ~16.67x lower on Devstral, so for many short or high-throughput research jobs Devstral can be far more cost-efficient despite lower synthesis quality in our benchmarks.

Question 3

Which model is better when research requires image inputs or figures?

Accepted Answer

Claude Haiku 4.5 supports text+image->text modality in the payload; Devstral Small 1.1 is text->text. For figure-aware extraction or image-based notes, Haiku is the appropriate choice in our data.

Question 4

Are there tasks where Devstral matches Haiku?

Accepted Answer

Yes. Both models tie on structured_output (score 4) and classification (score 4), so for JSON schema compliance, format adherence, and straightforward routing/classification tasks Devstral performs comparably in our tests.

Question 5

How do context windows compare and why does it matter for research?

Accepted Answer

Haiku’s context_window is 200,000 tokens and Haiku has max_output_tokens 64,000; Devstral’s context_window is 131,072 tokens. Larger context windows let the model ingest more source material at once, which benefits long literature reviews and multi-document synthesis—an advantage we observed in Haiku’s long_context=5 vs Devstral’s 4.

Claude Haiku 4.5 vs Devstral Small 1.1 for Research

Claude Haiku 4.5

Devstral Small 1.1

Task Analysis

Practical Examples

Bottom Line

How We Test

Frequently Asked Questions