Claude Haiku 4.5 vs Claude Sonnet 4.6 for Persona Consistency

Claude Sonnet 4.6 is the better choice for Persona Consistency. In our testing both Claude Haiku 4.5 and Claude Sonnet 4.6 scored 5/5 on the persona_consistency test and share the top rank (tied for 1st of 52). That parity on the direct task hides important differences in supporting capabilities: Sonnet 4.6 has a safety_calibration score of 5 versus Haiku 4.5's 2, a larger context_window (1,000,000 vs 200,000), and stronger creative_problem_solving (5 vs 4). Those advantages make Sonnet more robust against injection and long-running character drifts. The tradeoff is cost: Haiku is cheaper (input $1 / mTok, output $5 / mTok) vs Sonnet (input $3 / mTok, output $15 / mTok).

anthropic

Claude Haiku 4.5

Overall
4.33/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
5/5
Tool Calling
5/5
Classification
4/5
Agentic Planning
5/5
Structured Output
4/5
Safety Calibration
2/5
Strategic Analysis
5/5
Persona Consistency
5/5
Constrained Rewriting
3/5
Creative Problem Solving
4/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$1.00/MTok

Output

$5.00/MTok

Context Window200K

modelpicker.net

anthropic

Claude Sonnet 4.6

Overall
4.67/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
5/5
Tool Calling
5/5
Classification
4/5
Agentic Planning
5/5
Structured Output
4/5
Safety Calibration
5/5
Strategic Analysis
5/5
Persona Consistency
5/5
Constrained Rewriting
3/5
Creative Problem Solving
5/5

External Benchmarks

SWE-bench Verified
75.2%
MATH Level 5
N/A
AIME 2025
85.8%

Pricing

Input

$3.00/MTok

Output

$15.00/MTok

Context Window1000K

modelpicker.net

Task Analysis

What Persona Consistency demands: maintaining a character across turns and resisting prompt injection requires (1) reliable safety calibration to refuse or ignore malicious instructions, (2) long-context retention so the model can keep persona state over many tokens, (3) faithfulness and structured output to avoid silent drift or malformed persona data, and (4) robust reasoning to handle adversarial or ambiguous prompts. In our testing both models achieve the maximum task score (5/5) and tie for rank 1 on persona_consistency, so they both meet the baseline. To choose between them, look at the supporting metrics: Sonnet 4.6's safety_calibration is 5 (Haiku 4.5 = 2), both have faithfulness 5 and tool_calling 5, and both support structured outputs. Sonnet also offers a much larger context window (1,000,000 tokens vs Haiku's 200,000) and higher max_output_tokens (128,000 vs 64,000), which matters for very long interactions. These supporting scores explain why Sonnet is more robust in adversarial and long-running persona scenarios even though the direct persona_consistency task score is equal.

Practical Examples

  1. Adversarial chat assistant: A user repeatedly tries to trick the assistant into breaking character. Both models scored 5/5 on persona_consistency, but Sonnet's safety_calibration 5 vs Haiku's 2 means Sonnet is more likely to refuse or ignore injection attempts in our tests. 2) Serialized roleplay across massive context: For a roleplay campaign spanning hundreds of thousands of tokens, Sonnet's 1,000,000-token context_window and 128,000 max output tokens preserve persona state longer than Haiku's 200,000 / 64,000. 3) Cost-sensitive customer service bot: If you need strong persona behavior but must minimize cost, Haiku 4.5 offers the same 5/5 persona_consistency score at lower cost (input $1 / mTok, output $5 / mTok) compared with Sonnet (input $3 / mTok, output $15 / mTok). 4) Complex persona with creative constraints: Sonnet's creative_problem_solving 5 vs Haiku's 4 helps when the persona demands inventive, non-obvious behavior while still resisting injection.

Bottom Line

For Persona Consistency, choose Claude Haiku 4.5 if you need the top task score at the lowest cost (input $1 / mTok, output $5 / mTok) and your interactions are short-to-moderate in length. Choose Claude Sonnet 4.6 if you require stronger adversarial resistance and very long-lived personas — Sonnet adds safety_calibration 5 (vs Haiku 2), a 1,000,000-token context window (vs 200,000), and higher creative problem solving at a higher price (input $3 / mTok, output $15 / mTok).

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Frequently Asked Questions