Claude Sonnet 4.6 vs Gemini 2.5 Pro for Persona Consistency

Winner: Claude Sonnet 4.6. In our testing both models score 5/5 on persona_consistency and tie for rank 1, but Claude Sonnet 4.6's safety_calibration is 5 versus Gemini 2.5 Pro's 1. That large safety gap makes Sonnet 4.6 more reliable at resisting persona-injection and unsafe persona switches, so we name Claude Sonnet 4.6 the practical winner for Persona Consistency.

anthropic

Claude Sonnet 4.6

Overall
4.67/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
5/5
Tool Calling
5/5
Classification
4/5
Agentic Planning
5/5
Structured Output
4/5
Safety Calibration
5/5
Strategic Analysis
5/5
Persona Consistency
5/5
Constrained Rewriting
3/5
Creative Problem Solving
5/5

External Benchmarks

SWE-bench Verified
75.2%
MATH Level 5
N/A
AIME 2025
85.8%

Pricing

Input

$3.00/MTok

Output

$15.00/MTok

Context Window1000K

modelpicker.net

google

Gemini 2.5 Pro

Overall
4.25/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
5/5
Tool Calling
5/5
Classification
4/5
Agentic Planning
4/5
Structured Output
5/5
Safety Calibration
1/5
Strategic Analysis
4/5
Persona Consistency
5/5
Constrained Rewriting
3/5
Creative Problem Solving
5/5

External Benchmarks

SWE-bench Verified
57.6%
MATH Level 5
N/A
AIME 2025
84.2%

Pricing

Input

$1.25/MTok

Output

$10.00/MTok

Context Window1049K

modelpicker.net

Task Analysis

Persona Consistency demands (1) maintaining a stable character voice and memory across turns, (2) resisting injection or adversarial prompts that try to change the persona, and (3) producing reliably formatted or structured persona outputs when required. Our task definition (persona_consistency = "Maintains character and resists injection") maps directly to those requirements. On the persona_consistency test both models score 5/5 in our testing and are tied for 1st. To break the tie, supporting capabilities matter: safety_calibration (refusing or permitting requests correctly) is critical for resisting injection; structured_output matters when you enforce persona via JSON schemas or programmatic checkpoints; long_context and faithfulness support consistent recall of persona details across long conversations. Claude Sonnet 4.6 shows a strong safety profile (safety_calibration 5 in our testing) whereas Gemini 2.5 Pro scores 1 on safety_calibration in our testing. Conversely, Gemini 2.5 Pro scores 5 for structured_output versus Sonnet's 4, indicating stronger adherence to programmatic persona schemas. Both models tie on tool_calling (5), faithfulness (5), long_context (5), and persona_consistency (5), so the decisive factor for robust, adversarial-resistant persona behavior in our tests is Sonnet's safety_calibration advantage.

Practical Examples

  1. Brand-moderated customer support bot: Sonnet 4.6 is preferable. Both models hit 5/5 on persona_consistency in our testing, but Sonnet's safety_calibration = 5 vs Gemini's = 1, so Sonnet better resists malicious prompts that try to subvert brand rules. 2) Programmatic persona enforcement (JSON schemas): Gemini 2.5 Pro is preferable. Gemini scores 5 on structured_output in our testing vs Sonnet's 4, so it will more reliably emit exact persona JSON and schema-compliant fields for downstream systems. 3) Long-running multi-turn assistant: Either model can maintain persona (both 5/5 and long_context = 5), but trade cost and tooling: Sonnet context_window = 1,000,000 tokens and max_output_tokens = 128,000; Gemini context_window = 1,048,576 tokens and max_output_tokens = 65,536. 4) Cost-sensitive deployment: Gemini 2.5 Pro is materially cheaper — input cost 1.25 per mTok and output cost 10 per mTok versus Claude Sonnet 4.6 input 3 and output 15 per mTok — so choose Gemini when budget and exact schema output matter more than maximal safety calibration.

Bottom Line

For Persona Consistency, choose Claude Sonnet 4.6 if you need the safest, most injection‑resistant persona behavior (Sonnet: safety_calibration 5 vs Gemini: 1). Choose Gemini 2.5 Pro if you need programmatic, schema‑accurate persona outputs and lower cost (Gemini: structured_output 5 vs Sonnet: 4; Gemini input/output cost 1.25/10 vs Sonnet 3/15 per mTok). Both models score 5/5 on persona_consistency in our testing, so match the choice to safety vs structured-output and cost trade-offs.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Frequently Asked Questions