Question 1

Why did Claude Haiku 4.5 win this Chatbots comparison?

Accepted Answer

Because on our Chatbots tests (persona_consistency, safety_calibration, multilingual) Claude Haiku 4.5 scores 4.0 vs Codestral 2508's 2.67. Haiku leads on persona_consistency (5 vs 3) and multilingual (5 vs 4), which are primary for conversational consistency.

Question 2

Is Codestral 2508 ever the better choice for a chatbot?

Accepted Answer

Yes. Codestral 2508 scores 5 on structured_output vs Haiku's 4, and its input/output cost per mTok is lower (0.3/0.9 vs Haiku 1/5). Choose Codestral when strict JSON/schema adherence and lower inference cost matter more than persona fidelity or safety calibration.

Question 3

How do both models handle long conversations and tool integration?

Accepted Answer

Both models score 5 on long_context and 5 on tool_calling in our testing, so both can manage long multi-turn contexts and tool workflows. The difference is in persona consistency, safety calibration, and structured-output fidelity.

Question 4

What are the task ranks and how should I interpret them?

Accepted Answer

On our Chatbots task Claude Haiku 4.5 ranks 11 of 52; Codestral 2508 ranks 48 of 52. These ranks reflect the taskScore derived from the three component tests used for Chatbots in our suite.

Question 5

Are there cost differences I should budget for?

Accepted Answer

Yes. In the payload Haiku's per-mTok costs are input 1 and output 5, while Codestral's are input 0.3 and output 0.9. That makes Haiku output cost about 5.56× higher than Codestral on the provided per-mTok numbers.

Claude Haiku 4.5 vs Codestral 2508 for Chatbots

Claude Haiku 4.5

Codestral 2508

Task Analysis

Practical Examples

Bottom Line

How We Test

Frequently Asked Questions