models/xai/grok-4-20

xAI·active

Grok 4.20

xAI's mid-tier model. Long-context specialist with 2M window.

Overall score

4.23

/5.00 · ranked #66

Input

$1.25

per 1M tokens

Output

$2.50

per 1M tokens

Context

tokens

Blended

$2.19

3:1 out:in ratio

modelpicker.aipowered by live benchmark data

Scores by test

Methodology →

Structured Output

5.0

Strategic Analysis

5.0

Constrained Rewriting

4.0

Creative Problem Solving

4.0

Tool Calling

5.0

Faithfulness

5.0

Classification

4.0

Long Context

5.0

Safety Calibration

1.0

Persona Consistency

5.0

Agentic Planning

4.0

Multilingual

5.0

Tabular Data

3.0

What you need to know

Grok 4.20 is defined by high reliability in structured tasks and massive context handling. With a 2M token window and perfect internal scores in tool calling, faithfulness, and structured output, it is engineered for precision-heavy workflows where hallucination must be minimized and large datasets processed in a single pass.

The pricing is positioned at the higher end of the market, with a blended cost of $5.00/MTok. While expensive, the cost is justified by its top-tier performance in strategic analysis and multilingual capabilities. However, developers should note a critical failure in safety calibration, which scored 1/5, indicating a lack of built-in guardrails compared to other models in its rank.

The model ranks #21 of 71, placing it in the upper tier of general capability but lagging behind the absolute leaders. Its strengths are concentrated in execution and analysis rather than creative flexibility or agentic planning, where it performs well but not perfectly.

Use this model if you require a high-faithfulness engine for complex tool integration, strategic data analysis, or processing extremely long documents. Skip this model if your application requires strict safety filtering or if you are operating on a tight budget.

Strengths — Top 3

Structured Output5.0/5.0

Strategic Analysis5.0/5.0

Tool Calling5.0/5.0

Relative weaknesses — Bottom 3

Safety Calibration1.0/5.0

Tabular Data3.0/5.0

Constrained Rewriting4.0/5.0