models/google/gemini-2-5-pro
G
Google·active

Gemini 2.5 Pro

Google's mid-tier model. Long-context specialist with 1.0M window.

Overall score
4.23
/5.00 · ranked #36
Input
$1.25
per 1M tokens
Output
$10.00
per 1M tokens
Context
1.0M
tokens
Blended
$7.81
3:1 out:in ratio

Price drops, new benchmarks, model updates. Stay current on Gemini 2.5 Pro.

One email per change. Unsubscribe anytime.

modelpicker.aipowered by live benchmark data

Scores by test

Methodology →
Structured Output
5.0
Strategic Analysis
4.0
Constrained Rewriting
3.0
Creative Problem Solving
5.0
Tool Calling
5.0
Faithfulness
5.0
Classification
4.0
Long Context
5.0
Safety Calibration
1.0
Persona Consistency
5.0
Agentic Planning
4.0
Multilingual
5.0
Tabular Data
4.0
SWE-bench Verified
57.6
AIME 2025
84.2

What you need to know

Gemini 2.5 Pro is defined by its 1.0M token context window and high reliability in structured tasks. It achieves perfect internal scores for long context, tool calling, and structured output, making it highly effective for complex RAG pipelines or agentic workflows that require strict adherence to schemas. Its performance in coding and mathematics is validated by a 57.6% SWE-bench Verified score and 84.2% on AIME 2025.

The model is positioned at a premium price point, with output costs reaching $10.00 per million tokens. While the blended cost of $7.81 is high, the expense is justified for developers needing a model that maintains faithfulness and persona consistency across massive datasets. However, the model fails significantly in safety calibration, scoring 1/5, which indicates a lack of robust internal guardrails.

The model's versatility is strong across multilingual tasks and creative problem solving, though it is less effective at constrained rewriting. With an overall rank of 32 out of 71, it is a specialized tool rather than a general-purpose leader.

Use this model if your application requires a massive context window, precise tool integration, or high-fidelity structured data. Skip this model if your use case requires strict safety filtering or cost-efficient high-volume output.

Strengths — Top 3

Structured Output5.0/5.0
Creative Problem Solving5.0/5.0
Tool Calling5.0/5.0

Relative weaknesses — Bottom 3

Safety Calibration1.0/5.0
Constrained Rewriting3.0/5.0
Strategic Analysis4.0/5.0

Similar models

GGemini 3 Flash Preview$2.384.46XxAI: Grok Build 0.1$1.754.31QQwen 3.7 Max$6.254.62QQwen: Qwen3 235B A22B Instruct 2507$0.0934.08