Question 1

Which model is better at calling tools (build, test, linters) in coding workflows?

Accepted Answer

In our testing on the Coding task, Claude Haiku 4.5 scores tool_calling 5 vs Devstral Small 1.1’s 4 — Haiku is the better choice for precise function selection and sequencing in tool-driven workflows.

Question 2

Do both models produce machine-parseable patches (structured outputs) reliably?

Accepted Answer

Yes. In our testing both Claude Haiku 4.5 and Devstral Small 1.1 score 4/5 on structured_output, so they are comparable for JSON/schema-compliant code patches and CI-friendly outputs.

Question 3

Are there external SWE-bench Verified scores to settle the comparison?

Accepted Answer

The payload includes SWE-bench Verified (Epoch AI) as an external benchmark, but both models have null scores there in the data. Because external scores are missing, our internal task tests determine the winner.

Question 4

How big is the cost difference between the two models?

Accepted Answer

Haiku input/output: $1.00/1k and $5.00/1k. Devstral input/output: $0.10/1k and $0.30/1k. Haiku’s output cost is about 16.7× higher than Devstral’s according to the payload priceRatio.

Question 5

Does either model accept images as input for code screenshots?

Accepted Answer

According to the payload, Claude Haiku 4.5’s modality is text+image->text (it accepts images), while Devstral Small 1.1 is text->text. That makes Haiku preferable when you need to extract code from screenshots in a coding workflow.

Claude Haiku 4.5 vs Devstral Small 1.1 for Coding

Claude Haiku 4.5

Devstral Small 1.1

Task Analysis

Practical Examples

Bottom Line

How We Test

Frequently Asked Questions