models/xai/grok-4-20
X
xAI·active

Grok 4.20

xAI's efficiency model. Long-context specialist with 2M window.

Overall score
4.00
/5.00 · ranked #57
Input
$1.25
per 1M tokens
Output
$2.50
per 1M tokens
Context
2M
tokens
Blended
$2.19
3:1 out:in ratio

Price drops, new benchmarks, model updates. Stay current on Grok 4.20.

One email per change. Unsubscribe anytime.

modelpicker.aipowered by live benchmark data

Scores by test

Methodology →
Structured Output
5.0
Strategic Analysis
5.0
Constrained Rewriting
4.0
Creative Problem Solving
4.0
Tool Calling
5.0
Faithfulness
5.0
Classification
4.0
Long Context
5.0
Safety Calibration
1.0
Persona Consistency
5.0
Agentic Planning
4.0
Multilingual
5.0
Tabular Data

What you need to know

Grok 4.20 is defined by high reliability in structured tasks and massive context handling. With a 2M token window and perfect internal scores in tool calling, faithfulness, and structured output, it is engineered for precision-heavy workflows where hallucination must be minimized and large datasets processed in a single pass.

The pricing is positioned at the higher end of the market, with a blended cost of $5.00/MTok. While expensive, the cost is justified by its top-tier performance in strategic analysis and multilingual capabilities. However, developers should note a critical failure in safety calibration, which scored 1/5, indicating a lack of built-in guardrails compared to other models in its rank.

The model ranks #21 of 71, placing it in the upper tier of general capability but lagging behind the absolute leaders. Its strengths are concentrated in execution and analysis rather than creative flexibility or agentic planning, where it performs well but not perfectly.

Use this model if you require a high-faithfulness engine for complex tool integration, strategic data analysis, or processing extremely long documents. Skip this model if your application requires strict safety filtering or if you are operating on a tight budget.

Strengths — Top 3

Structured Output5.0/5.0
Strategic Analysis5.0/5.0
Tool Calling5.0/5.0

Relative weaknesses — Bottom 3

Safety Calibration1.0/5.0
Constrained Rewriting4.0/5.0
Creative Problem Solving4.0/5.0

Similar models

AClaude Haiku 4.5$4.004.00OGPT-5.4 Mini$3.564.15QQwen: Qwen3 30B A3B Instruct 2507$0.2473.62OGPT-5.4 Nano$0.9883.92