models/openai/gpt-5-1
O
OpenAI·active

GPT-5.1

OpenAI's mid-tier model. Context window: 400K tokens.

Overall score
4.23
/5.00 · ranked #38
Input
$1.25
per 1M tokens
Output
$10.00
per 1M tokens
Context
400K
tokens
Blended
$7.81
3:1 out:in ratio

Price drops, new benchmarks, model updates. Stay current on GPT-5.1.

One email per change. Unsubscribe anytime.

modelpicker.aipowered by live benchmark data

Scores by test

Methodology →
Structured Output
4.0
Strategic Analysis
5.0
Constrained Rewriting
4.0
Creative Problem Solving
4.0
Tool Calling
4.0
Faithfulness
5.0
Classification
4.0
Long Context
5.0
Safety Calibration
2.0
Persona Consistency
5.0
Agentic Planning
4.0
Multilingual
5.0
Tabular Data
4.0
SWE-bench Verified
68.0
AIME 2025
88.6

What you need to know

GPT-5.1 is optimized for high-fidelity retrieval and complex reasoning over large datasets, distinguished by perfect internal scores in faithfulness, long context handling, and persona consistency. Its 400K context window is backed by a 5/5 long context rating, making it a reliable choice for applications requiring strict adherence to provided source material without hallucination.

The model demonstrates strong technical capabilities, particularly in mathematics and coding, as evidenced by an 88.6% AIME 2025 score and 68% on SWE-bench Verified. However, it struggles with safety calibration, scoring 2/5 internally, which indicates a higher risk of generating unfiltered or non-compliant content compared to other models in its class.

At a blended cost of $7.81/MTok, this model is positioned at a premium price point. While it ranks #34 of 71 overall, its value is concentrated in strategic analysis and multilingual tasks rather than general-purpose classification or structured output, where it performs adequately but not exceptionally.

Use this model if your workflow requires high factual accuracy, complex strategic planning, or the processing of massive documents. Skip this model if you require strict safety guardrails or a cost-effective solution for simple classification and structured data extraction.

Strengths — Top 3

Strategic Analysis5.0/5.0
Faithfulness5.0/5.0
Long Context5.0/5.0

Relative weaknesses — Bottom 3

Safety Calibration2.0/5.0
Structured Output4.0/5.0
Constrained Rewriting4.0/5.0

Similar models

MMistral Medium 3.5$6.004.15QQwen: Qwen3 235B A22B Instruct 2507$0.0934.08GGemma 4 26B A4B $0.2634.23GGemma 4 31B$0.3074.38