models/openai/gpt-4-1
O
OpenAI·active

GPT-4.1

OpenAI's mid-tier model. Long-context specialist with 1.0M window.

Overall score
4.23
/5.00 · ranked #35
Input
$2.00
per 1M tokens
Output
$8.00
per 1M tokens
Context
1.0M
tokens
Blended
$6.50
3:1 out:in ratio

Price drops, new benchmarks, model updates. Stay current on GPT-4.1.

One email per change. Unsubscribe anytime.

modelpicker.aipowered by live benchmark data

Scores by test

Methodology →
Structured Output
4.0
Strategic Analysis
5.0
Constrained Rewriting
5.0
Creative Problem Solving
3.0
Tool Calling
5.0
Faithfulness
5.0
Classification
4.0
Long Context
5.0
Safety Calibration
1.0
Persona Consistency
5.0
Agentic Planning
4.0
Multilingual
5.0
Tabular Data
4.0
SWE-bench Verified
48.5
MATH Level 5
83.0
AIME 2025
38.3

What you need to know

GPT-4.1 is optimized for high-precision, long-context tasks, featuring a 1.0M token window and perfect internal scores for faithfulness, persona consistency, and strategic analysis. Its technical strength is most evident in its reliability for constrained rewriting and tool calling, making it a stable choice for complex pipelines where output drift cannot be tolerated.

The model is priced at a premium, with a blended cost of $6.50/MTok. While it ranks 35th overall out of 71 models, its value is concentrated in specialized reasoning rather than general utility. It performs strongly in quantitative domains, scoring 83% on MATH Level 5, though its 38.3% AIME 2025 score suggests a ceiling in elite-level competitive mathematics.

A critical weakness is its safety calibration, which scored 1/5, indicating a lack of alignment or restrictive filtering that may be problematic for public-facing applications. Additionally, its creative problem solving is mediocre, scoring 3/5, suggesting it is better suited for analytical rigor than generative novelty.

Use this model if you require a high-fidelity agent for long-document analysis, structured tool use, or multilingual strategic planning. Skip this model if you are on a tight budget, need a model with strong safety guardrails, or require high levels of creative intuition.

Strengths — Top 3

Strategic Analysis5.0/5.0
Constrained Rewriting5.0/5.0
Tool Calling5.0/5.0

Relative weaknesses — Bottom 3

Safety Calibration1.0/5.0
Creative Problem Solving3.0/5.0
Structured Output4.0/5.0

Similar models

MMistral Medium 3.1$1.604.23XxAI: Grok Build 0.1$1.754.31DR1 0528$1.744.46MMistral Medium 3.5$6.004.15