Claude Haiku 4.5 vs Devstral Small 1.1 for Persona Consistency

Winner: Claude Haiku 4.5. In our testing Claude Haiku 4.5 scores 5/5 on Persona Consistency vs Devstral Small 1.1 at 2/5 — a clear 3-point margin. Haiku 4.5 is tied for 1st on this task (rank 1 of 52) while Devstral Small 1.1 ranks 51 of 52. There is no external benchmark for this comparison, so our internal persona_consistency score and task ranks are the primary basis for the verdict.

anthropic

Claude Haiku 4.5

Overall
4.33/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
5/5
Tool Calling
5/5
Classification
4/5
Agentic Planning
5/5
Structured Output
4/5
Safety Calibration
2/5
Strategic Analysis
5/5
Persona Consistency
5/5
Constrained Rewriting
3/5
Creative Problem Solving
4/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$1.00/MTok

Output

$5.00/MTok

Context Window200K

modelpicker.net

mistral

Devstral Small 1.1

Overall
3.08/5Usable

Benchmark Scores

Faithfulness
4/5
Long Context
4/5
Multilingual
4/5
Tool Calling
4/5
Classification
4/5
Agentic Planning
2/5
Structured Output
4/5
Safety Calibration
2/5
Strategic Analysis
2/5
Persona Consistency
2/5
Constrained Rewriting
3/5
Creative Problem Solving
2/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$0.100/MTok

Output

$0.300/MTok

Context Window131K

modelpicker.net

Task Analysis

Persona Consistency requires an LLM to maintain a specified character voice across turns, follow role constraints, and resist prompt-injection attempts that try to change or break the persona. Key capabilities that matter: long-context handling (to remember persona details), faithfulness to the given persona template, resistance to injection or instruction overrides, and reliable structured output when a role requires formalized responses. In our testing Claude Haiku 4.5 scored 5 on persona_consistency, plus 5 on long_context and 5 on faithfulness — supporting its ability to remember, adhere to, and not deviate from persona specs. Devstral Small 1.1 scored 2 on persona_consistency with lower supporting signals (long_context 4, faithfulness 4), which explains its weaker resistance to injection and lapses in maintaining persona over extended contexts.

Practical Examples

Scenario A — Multi-session roleplay with subtle injection attempts: Claude Haiku 4.5 (persona 5/5, long_context 5, faithfulness 5) keeps a character's backstory, tone, and constraints over long sessions and ignores injected prompts that try to override role rules. Devstral Small 1.1 (persona 2/5, long_context 4, faithfulness 4) may drift in voice or follow an injected instruction that changes the persona. Scenario B — Customer support agent with strict brand voice and formatted replies: Haiku 4.5’s persona score plus structured_output 4 supports consistent branded responses and predictable formatting; Devstral Small 1.1 matches structured_output 4 but its lower persona_consistency increases risk of voice drift. Scenario C — Low-cost, high-volume automation where occasional persona drift is tolerable: Devstral Small 1.1 is far cheaper (input_cost_per_mtok: 0.1, output_cost_per_mtok: 0.3) versus Claude Haiku 4.5 (input_cost_per_mtok: 1, output_cost_per_mtok: 5). If strict persona fidelity is not required, Small 1.1 can be a pragmatic cost-saving choice despite the weaker persona scores.

Bottom Line

For Persona Consistency, choose Claude Haiku 4.5 if you need robust character maintenance and strong resistance to prompt injection (Haiku 4.5: 5/5, tied for 1st). Choose Devstral Small 1.1 if cost sensitivity and throughput matter more than strict persona fidelity (Devstral Small 1.1: 2/5) and you can tolerate occasional voice drift.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Frequently Asked Questions