Claude Haiku 4.5 vs Devstral Medium for Persona Consistency
Claude Haiku 4.5 is the clear winner for Persona Consistency in our testing. It scores 5 versus Devstral Medium's 3 (a 2-point gap). In our suite Haiku ranks 1st for this task (rank 1 of 52) while Devstral ranks 45th of 52. Haiku's stronger supporting capabilities — long-context (5 vs 4), tool-calling (5 vs 3), and faithfulness (5 vs 4) — explain its ability to maintain character and resist prompt injection. Devstral Medium is cheaper per output token (2 vs 5) but is weaker on the specific robustness and retention measures that matter for persona consistency.
anthropic
Claude Haiku 4.5
Benchmark Scores
External Benchmarks
Pricing
Input
$1.00/MTok
Output
$5.00/MTok
modelpicker.net
mistral
Devstral Medium
Benchmark Scores
External Benchmarks
Pricing
Input
$0.400/MTok
Output
$2.00/MTok
modelpicker.net
Task Analysis
Persona Consistency demands: (1) sustained memory of a character or role across turns, (2) resistance to prompt-injection or style-drift, and (3) accurate, on-character responses under long contexts. Key capabilities from our benchmarks that matter: long_context (retrieval and coherence across 30K+ tokens), tool_calling (correct sequencing and argument boxing when tools or persona state are exposed), faithfulness (sticking to source persona without hallucination), and safety_calibration (refusing harmful persona-altering injections). In our testing, Claude Haiku 4.5 scores 5 on persona_consistency and also scores 5 on long_context, tool_calling, and faithfulness — a profile that directly supports persona retention and injection resistance. Devstral Medium scores 3 on persona_consistency with lower tool_calling (3) and slightly lower long_context and faithfulness (4 each), which helps explain its weaker result. There is no external benchmark for this task in the payload, so our internal taskScore (5 vs 3) is the primary signal.
Practical Examples
Scenario 1 — Multi-session character assistant: You need a role-playing customer support persona that preserves backstory over a 100K-token transcript. Claude Haiku 4.5 (persona 5, long_context 5) is better positioned to keep details consistent across long history. Scenario 2 — Agent interacting with tools while keeping persona: If prompts call external tools or inject new instructions, Haiku's tool_calling 5 and persona 5 reduce the risk of the agent adopting injected roles. Scenario 3 — Cost-sensitive bulk generation: If you must generate many persona-consistent messages at minimum cost and can tolerate occasional drift, Devstral Medium is cheaper (output cost 2 per mTok vs Haiku 5) and may be acceptable for short-turn, lower-risk personas. Scenario 4 — Tight prompt-control with structured outputs: Both models support structured_outputs; Haiku's higher faithfulness (5 vs 4) means JSON or schema outputs are less likely to include off-persona content in our tests. All example comparisons are grounded in our measured scores: persona_consistency 5 vs 3, tool_calling 5 vs 3, long_context 5 vs 4, faithfulness 5 vs 4, and output cost 5 vs 2 (Haiku vs Devstral).
Bottom Line
For Persona Consistency, choose Claude Haiku 4.5 if you need robust, long-context character maintenance and resistance to prompt injection (scores: persona 5, long_context 5, tool_calling 5). Choose Devstral Medium if your primary constraint is cost or short-turn throughput and you can accept weaker persona robustness (scores: persona 3, lower tool_calling and long-context), noting Devstral’s lower output cost per mTok (2 vs Haiku's 5).
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.