models/xai/grok-3-mini

xAI·deprecated 2026-05-15

Grok 3 Mini

xAI's efficiency model. Context window: 131K tokens.

Overall score

3.77

/5.00 · ranked #68

Input

$0.300

per 1M tokens

Output

$0.500

per 1M tokens

Context

131K

tokens

Blended

$0.450

3:1 out:in ratio

modelpicker.aipowered by live benchmark data

Scores by test

Methodology →

Structured Output

4.0

Strategic Analysis

3.0

Constrained Rewriting

4.0

Creative Problem Solving

3.0

Tool Calling

5.0

Faithfulness

5.0

Classification

4.0

Long Context

5.0

Safety Calibration

2.0

Persona Consistency

5.0

Agentic Planning

3.0

Multilingual

4.0

Tabular Data

2.0

What you need to know

Grok 3 Mini differentiates itself through high reliability in execution and consistency. It achieves perfect internal scores for tool calling, persona consistency, and faithfulness, making it a stable choice for applications requiring strict adherence to a specific brand voice or precise API interactions. Its 131K context window is fully utilized, as evidenced by a maximum score in long-context processing.

The model is priced aggressively for its capabilities, with a blended cost of $0.450/MTok. While it ranks lower overall (#55 of 71), this rank is heavily skewed by significant failures in safety calibration and tabular data processing, both scoring 2/5. It is not a general-purpose reasoning engine, as it performs only moderately in agentic planning and strategic analysis.

Developers should use this model for high-volume automation tasks that require reliable tool integration, long-document analysis, or rigid persona maintenance on a budget. Skip this model for data-heavy tasks involving tables, applications requiring strict safety guardrails, or complex multi-step autonomous planning.

Strengths — Top 3

Tool Calling5.0/5.0

Faithfulness5.0/5.0

Long Context5.0/5.0

Relative weaknesses — Bottom 3

Safety Calibration2.0/5.0

Tabular Data2.0/5.0

Strategic Analysis3.0/5.0