Claude Sonnet 4.6 vs GPT-5.4 for Persona Consistency
Tie — In our testing both Claude Sonnet 4.6 and GPT-5.4 achieve the top Persona Consistency score (5/5) and share rank 1 of 52. Neither model outscored the other on the persona_consistency test; choose based on secondary strengths (tool calling, structured output, file modality, and pricing) rather than persona score itself.
anthropic
Claude Sonnet 4.6
Benchmark Scores
External Benchmarks
Pricing
Input
$3.00/MTok
Output
$15.00/MTok
modelpicker.net
openai
GPT-5.4
Benchmark Scores
External Benchmarks
Pricing
Input
$2.50/MTok
Output
$15.00/MTok
modelpicker.net
Task Analysis
Persona Consistency demands (per our benchmarkDescriptions) that a model maintain character and resist injection. Key capabilities that matter are: safety_calibration (refusing harmful or irrelevant persona overrides), long_context (tracking persona across 30K+ tokens), faithfulness (sticking to the defined character), tool_calling (preserving persona during tool interactions), and structured_output (ensuring persona-aligned fields in schemas). With no external benchmark provided, our internal task scores are primary evidence: both models scored 5/5 on persona_consistency and are tied at rank 1. Supporting internal signals show differences useful for deployment decisions: Sonnet 4.6 scores 5 on tool_calling (helpful when agent/tool sequences must preserve persona) while GPT-5.4 scores 5 on structured_output (helpful when strict schema compliance must reflect persona). Both models score 5 on safety_calibration, faithfulness, and long_context in our tests — all core abilities for resisting persona injection.
Practical Examples
- Multi-step agent with persona-bound tool calls — Sonnet 4.6 shines: persona_consistency 5 plus tool_calling 5 means it kept character across function selection and arguments in our tests. 2) API that must return strict JSON blocks with persona fields — GPT-5.4 shines: both models are 5/5 on persona_consistency, but GPT-5.4 has structured_output 5 (vs Sonnet's 4), so it performed better on schema adherence while preserving persona. 3) Long chat history and role-play — both models perform equally: persona_consistency 5 and long_context 5. 4) File-based onboarding of persona documents — prefer GPT-5.4 because its modality is text+image+file->text (Sonnet is text+image->text). 5) Cost-sensitive input-heavy workloads — GPT-5.4 has lower input cost (2.5 per mTok vs Sonnet 3 per mTok); both have equal output cost (15 per mTok). Use these concrete score differences when mapping to your product flow.
Bottom Line
For Persona Consistency, choose Claude Sonnet 4.6 if you need best-in-class tool calling while preserving persona (Sonnet: persona_consistency 5, tool_calling 5). Choose GPT-5.4 if you need strict structured output or file-based persona onboarding (GPT-5.4: persona_consistency 5, structured_output 5, modality includes file inputs). Both models tie on core persona metrics in our tests, so pick by integration, schema, tool workflow, or input-cost tradeoffs.
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.