models/openai/gpt-5
O
OpenAI·active

GPT-5

OpenAI's mid-tier model. Context window: 400K tokens.

Overall score
4.54
/5.00 · ranked #12
Input
$1.25
per 1M tokens
Output
$10.00
per 1M tokens
Context
400K
tokens
Blended
$7.81
3:1 out:in ratio

Price drops, new benchmarks, model updates. Stay current on GPT-5.

One email per change. Unsubscribe anytime.

modelpicker.aipowered by live benchmark data

Scores by test

Methodology →
Structured Output
5.0
Strategic Analysis
5.0
Constrained Rewriting
4.0
Creative Problem Solving
4.0
Tool Calling
5.0
Faithfulness
5.0
Classification
4.0
Long Context
5.0
Safety Calibration
2.0
Persona Consistency
5.0
Agentic Planning
5.0
Multilingual
5.0
Tabular Data
5.0
SWE-bench Verified
73.6
MATH Level 5
98.1
AIME 2025
91.4

What you need to know

GPT-5 is a high-performance model optimized for precision and complex reasoning, ranking 7th out of 71 evaluated models. It demonstrates near-perfect proficiency in mathematical and technical tasks, evidenced by a 98.1% score on MATH Level 5 and a 91.4% score on AIME 2025. Its primary technical advantage lies in its reliability for structured workflows, achieving maximum internal scores for tool calling, faithfulness, and structured output.

The model is positioned at a premium price point with a blended cost of $7.81/MTok, reflecting its capability as a top-tier reasoning engine. This cost is justified for developers requiring a massive 400K context window and high-accuracy agentic planning. However, the model shows a significant deficit in safety calibration, scoring only 2/5, which indicates a higher likelihood of generating unfiltered or non-compliant responses compared to other frontier models.

While it excels at strategic analysis and tabular data, it is slightly less effective at creative problem solving and classification. This suggests the model is better suited for deterministic, logic-heavy applications than for open-ended generative tasks or simple categorization.

Use this model if your application requires high-stakes mathematical accuracy, complex agentic orchestration, or the processing of very large documents. Skip this model if you are operating on a tight budget or if your use case requires strict safety guardrails and high calibration.

Strengths — Top 3

Structured Output5.0/5.0
Strategic Analysis5.0/5.0
Tool Calling5.0/5.0

Relative weaknesses — Bottom 3

Safety Calibration2.0/5.0
Constrained Rewriting4.0/5.0
Creative Problem Solving4.0/5.0

Similar models

QQwen 3.7 Max$6.254.62GGemma 4 31B$0.3074.38QQwen: Qwen3.6 Plus$1.544.54NNVIDIA: Nemotron 3 Super$0.3604.46