models/xai/grok-4

xAI·deprecated 2026-05-15

Grok 4

xAI's efficiency model. Context window: 256K tokens.

Overall score

4.15

/5.00 · ranked #79

Input

$3.00

per 1M tokens

Output

$15.00

per 1M tokens

Context

256K

tokens

Blended

$12.00

3:1 out:in ratio

modelpicker.aipowered by live benchmark data

Scores by test

Methodology →

Structured Output

4.0

Strategic Analysis

5.0

Constrained Rewriting

4.0

Creative Problem Solving

3.0

Tool Calling

4.0

Faithfulness

5.0

Classification

4.0

Long Context

5.0

Safety Calibration

2.0

Persona Consistency

5.0

Agentic Planning

3.0

Multilingual

5.0

Tabular Data

5.0

What you need to know

Grok 4 distinguishes itself through high-precision data handling and analytical rigor. It achieves a perfect 5/5 score in tabular data, strategic analysis, and faithfulness, making it highly reliable for extracting insights from complex datasets or performing detailed business audits. Its 256K context window is fully utilized, as evidenced by a maximum score in long-context processing.

The pricing is aggressive, with a blended cost of $12.00/MTok. Given its rank of 40th out of 71 models, the cost-to-performance ratio is low; developers are paying a premium for a model that performs in the middle of the pack overall. While it excels in structured tasks and multilingual support, it lags in cognitive flexibility, scoring only 3/5 in agentic planning and creative problem solving.

Safety calibration is a notable weakness, scoring 2/5. This indicates a higher likelihood of bypassing safety guardrails or producing unfiltered content compared to other models in its tier. This lack of constraint may be an advantage for specific unrestricted use cases but poses a risk for consumer-facing applications.

Use this model if your workflow requires high faithfulness, multilingual capabilities, and the processing of large tabular datasets. Skip this model if you require autonomous agentic planning, creative generation, or strict safety alignment.

Strengths — Top 3

Strategic Analysis5.0/5.0

Faithfulness5.0/5.0

Long Context5.0/5.0

Relative weaknesses — Bottom 3

Safety Calibration2.0/5.0

Creative Problem Solving3.0/5.0

Agentic Planning3.0/5.0