GPT-5.4
OpenAI's mid-tier model. Long-context specialist with 1.1M window.
Scores by test
Methodology →What you need to know
GPT-5.4 is optimized for high-reliability autonomous workflows, distinguished by perfect internal scores in agentic planning, structured output, and faithfulness. Its performance in complex reasoning is validated by a 95.3% score on AIME 2025 and a 76.9% success rate on SWE-bench Verified, making it a top-tier choice for software engineering and strategic analysis tasks.
The model handles massive datasets efficiently with a 1.1M token context window and maintains perfect scores for long-context retrieval and persona consistency. However, it is less effective for simple classification tasks, where it scores significantly lower than in its primary reasoning categories.
At a blended cost of $11.88 per million tokens, this is a premium-priced model. The high output cost of $15.00/MTok reflects its positioning as a high-intelligence engine rather than a cost-efficient utility for high-volume, simple tasks.
Use this model for complex agentic orchestration, large-scale codebase analysis, and tasks requiring strict adherence to structured formats. Skip this model for basic text classification or high-throughput applications where cost efficiency is prioritized over deep reasoning.
Strengths — Top 3
Relative weaknesses — Bottom 3
Similar models