Question 1

Do these models actually differ on Faithfulness?

Accepted Answer

On our faithfulness benchmark both Claude Haiku 4.5 and Codestral 2508 scored 5/5. The difference is in supporting dimensions: Haiku has higher persona_consistency (5 vs 3) and slightly better safety_calibration (2 vs 1), while Codestral excels at structured_output (5 vs 4).

Question 2

Which model is more cost-effective for faithfulness-focused workloads?

Accepted Answer

Codestral 2508 is less expensive per the provided costs (input_cost_per_mtok=0.3, output_cost_per_mtok=0.9) versus Claude Haiku 4.5 (input_cost_per_mtok=1, output_cost_per_mtok=5). If your pipeline values cost and strict schema output more than conversational persona and refusal behavior, Codestral is more cost-effective.

Question 3

If both score 5/5, why prefer Claude Haiku 4.5 for safety-sensitive tasks?

Accepted Answer

Because faithfulness failures often appear in multi-turn or adversarial prompts. In our testing Haiku’s persona_consistency is 5 (vs 3 for Codestral) and safety_calibration is 2 (vs 1), which reduces risk of the model drifting into fabricated claims or failing to refuse unsafe inference.

Question 4

Which model should I pick for generating validated JSON citations alongside extracted facts?

Accepted Answer

Pick Codestral 2508: it scores 5 on structured_output (vs Haiku’s 4) and also scored 5/5 on the faithfulness test, making it stronger for strict schema compliance and machine parsing.

Claude Haiku 4.5 vs Codestral 2508 for Faithfulness

Claude Haiku 4.5

Codestral 2508

Task Analysis

Practical Examples

Bottom Line

How We Test

Frequently Asked Questions