Claude Haiku 4.5 vs Codestral 2508 for Persona Consistency

Winner: Claude Haiku 4.5. In our testing across the Persona Consistency task, Claude Haiku 4.5 scores 5/5 vs Codestral 2508's 3/5 — a clear 2‑point advantage. Haiku also ranks tied for 1st (with 36 others) on persona_consistency out of 53 models; Codestral ranks 45 of 53 (tied with 5). There is no external benchmark for this task in the payload, so this verdict is based on our internal persona_consistency score and supporting proxy metrics.

anthropic

Claude Haiku 4.5

Overall
4.33/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
5/5
Tool Calling
5/5
Classification
4/5
Agentic Planning
5/5
Structured Output
4/5
Safety Calibration
2/5
Strategic Analysis
5/5
Persona Consistency
5/5
Constrained Rewriting
3/5
Creative Problem Solving
4/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$1.00/MTok

Output

$5.00/MTok

Context Window200K

modelpicker.net

mistral

Codestral 2508

Overall
3.50/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
4/5
Tool Calling
5/5
Classification
3/5
Agentic Planning
4/5
Structured Output
5/5
Safety Calibration
1/5
Strategic Analysis
2/5
Persona Consistency
3/5
Constrained Rewriting
3/5
Creative Problem Solving
2/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$0.300/MTok

Output

$0.900/MTok

Context Window256K

modelpicker.net

Task Analysis

Persona Consistency demands maintaining a character or role over turns and resisting prompt injection. Key capabilities: contextual memory at scale (long_context), strict adherence to persona (persona_consistency), resistance to adversarial prompts (safety_calibration), and structured behavior when given format constraints (structured_output). In the provided data there is no external benchmark for this task, so our taskScore is the primary evidence: Claude Haiku 4.5 = 5/5; Codestral 2508 = 3/5. Supporting internal metrics: both models score 5/5 on long_context and 5/5 on tool_calling, so both can hold large histories and call functions reliably. Haiku has strong safety_calibration (2/5 vs Codestral's 1/5) and higher persona_consistency, while Codestral leads on structured_output (5/5 vs Haiku's 4/5) and is substantially cheaper (Haiku output $5/mTok vs Codestral $0.9/mTok). Use the persona_consistency score as the decisive signal for this task.

Practical Examples

  1. Long-form roleplay with adversarial edits — Claude Haiku 4.5 (persona_consistency 5/5, long_context 5/5, safety_calibration 2/5) will more reliably stay ‘in character’ and refuse injection attempts; expect fewer persona breaks in our tests. 2) High-volume, cost-sensitive assistants that must follow a strict output schema — Codestral 2508 (persona_consistency 3/5, structured_output 5/5) is better when you need JSON-first, low-cost inference (input $0.3/mTok, output $0.9/mTok) but you should add persona-checking prompts or external validation because its persona consistency is weaker. 3) Hybrid product: use Haiku for customer-facing persona-critical flows (onboarding, therapeutic or legal roleplay) where a 2-point persona gap matters; use Codestral for backend generation where structured output and lower cost outweigh occasional persona drift.

Bottom Line

For Persona Consistency, choose Claude Haiku 4.5 if you need the model to reliably maintain character and resist prompt injection (5/5 in our testing, tied for 1st). Choose Codestral 2508 if you prioritize structured-output fidelity and much lower runtime costs (output $0.9/mTok) and can tolerate weaker persona guarantees (3/5 in our testing) or add external persona checks.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Frequently Asked Questions