Question 1

How big is the persona consistency gap between the two models?

Accepted Answer

In our testing Claude Haiku 4.5 scores 5/5 on persona_consistency vs Codestral 2508's 3/5 — a 2‑point gap, which we consider a meaningful difference for roleplay and adversarial-resistance use cases.

Question 2

Does long-context handling explain Haiku's advantage?

Accepted Answer

Both models score 5/5 on long_context in our data, so long-context capacity alone doesn't explain the gap. The difference aligns with Haiku's higher persona_consistency and stronger safety_calibration (2 vs 1), which in our tests led to fewer persona breaks and better resistance to injection.

Question 3

Can Codestral be used for persona-driven applications to save cost?

Accepted Answer

Yes — Codestral 2508 is much cheaper (input $0.3/mTok, output $0.9/mTok) and excels at structured_output (5/5). For persona-critical flows you should add validation layers (prompt checks, external tests, or reranking) because its persona_consistency is 3/5 in our testing.

Question 4

Are there external benchmarks that change this verdict?

Accepted Answer

No external benchmark for Persona Consistency is included in the payload. Our verdict is based on internal taskScore and supporting proxy metrics provided in the dataset.

Question 5

What about safety and adversarial prompts?

Accepted Answer

Haiku shows better safety_calibration in our tests (2/5 vs Codestral's 1/5), which contributed to its superior persona stability under adversarial prompt injections.

Claude Haiku 4.5 vs Codestral 2508 for Persona Consistency

Claude Haiku 4.5

Codestral 2508

Task Analysis

Practical Examples

Bottom Line

How We Test

Frequently Asked Questions