models/mistral/devstral-2512

Mistral·active

Devstral 2 2512

Name: Devstral 2 2512
Brand: Mistral
Price: 2.00 USD
Availability: InStock
Rating: 3.92 (13 reviews)

Mistral's efficiency model. Context window: 262K tokens.

Overall score

3.92

/5.00 · ranked #93

Input

$0.400

per 1M tokens

Output

$2.00

per 1M tokens

Context

262K

tokens

Blended

$1.60

3:1 out:in ratio

modelpicker.aipowered by live benchmark data

Scores by test

Methodology →

Structured Output

5.0

Strategic Analysis

4.0

Constrained Rewriting

5.0

Creative Problem Solving

4.0

Tool Calling

4.0

Faithfulness

4.0

Classification

3.0

Long Context

5.0

Safety Calibration

1.0

Persona Consistency

4.0

Agentic Planning

4.0

Multilingual

5.0

Tabular Data

3.0

What you need to know

Devstral 2 2512 is optimized for high-precision formatting and long-range retrieval. It achieves perfect scores in structured output, constrained rewriting, and long context handling, making it a reliable choice for developers needing strict adherence to schemas or processing documents within its 262K context window. Its multilingual capabilities are equally strong, providing consistent performance across different languages.

The pricing is competitive for a specialized model, with a blended cost of $1.60/MTok. While it ranks in the middle of the pack overall (#48 of 71), its value lies in its specific technical strengths rather than general-purpose reasoning. However, the model has a significant failure point in safety calibration, scoring 1/5, which indicates a lack of built-in guardrails or a tendency to ignore safety constraints.

Performance is mediocre when handling tabular data or simple classification tasks. Developers should not rely on this model for data extraction from tables or high-accuracy labeling, as these areas underperform relative to its strengths in structured generation.

Use this model if your workflow requires strict JSON/schema adherence, extensive context windows, or complex rewriting tasks. Skip this model if your application requires rigorous safety filtering or high-precision classification and tabular analysis.

Strengths — Top 3

Structured Output5.0/5.0

Constrained Rewriting5.0/5.0

Long Context5.0/5.0

Relative weaknesses — Bottom 3

Safety Calibration1.0/5.0

Classification3.0/5.0

Tabular Data3.0/5.0