models/mistral/devstral-2512
M
Mistral·active

Devstral 2 2512

Mistral's efficiency model. Context window: 262K tokens.

Overall score
3.92
/5.00 · ranked #59
Input
$0.400
per 1M tokens
Output
$2.00
per 1M tokens
Context
262K
tokens
Blended
$1.60
3:1 out:in ratio

Price drops, new benchmarks, model updates. Stay current on Devstral 2 2512.

One email per change. Unsubscribe anytime.

modelpicker.aipowered by live benchmark data

Scores by test

Methodology →
Structured Output
5.0
Strategic Analysis
4.0
Constrained Rewriting
5.0
Creative Problem Solving
4.0
Tool Calling
4.0
Faithfulness
4.0
Classification
3.0
Long Context
5.0
Safety Calibration
1.0
Persona Consistency
4.0
Agentic Planning
4.0
Multilingual
5.0
Tabular Data
3.0

What you need to know

Devstral 2 2512 is optimized for high-precision formatting and long-range retrieval. It achieves perfect scores in structured output, constrained rewriting, and long context handling, making it a reliable choice for developers needing strict adherence to schemas or processing documents within its 262K context window. Its multilingual capabilities are equally strong, providing consistent performance across different languages.

The pricing is competitive for a specialized model, with a blended cost of $1.60/MTok. While it ranks in the middle of the pack overall (#48 of 71), its value lies in its specific technical strengths rather than general-purpose reasoning. However, the model has a significant failure point in safety calibration, scoring 1/5, which indicates a lack of built-in guardrails or a tendency to ignore safety constraints.

Performance is mediocre when handling tabular data or simple classification tasks. Developers should not rely on this model for data extraction from tables or high-accuracy labeling, as these areas underperform relative to its strengths in structured generation.

Use this model if your workflow requires strict JSON/schema adherence, extensive context windows, or complex rewriting tasks. Skip this model if your application requires rigorous safety filtering or high-precision classification and tabular analysis.

Strengths — Top 3

Structured Output5.0/5.0
Constrained Rewriting5.0/5.0
Long Context5.0/5.0

Relative weaknesses — Bottom 3

Safety Calibration1.0/5.0
Classification3.0/5.0
Tabular Data3.0/5.0

Similar models

QQwen: Qwen3 235B A22B Instruct 2507$0.0934.08OGPT-4.1 Mini$1.303.92OOpenAI: gpt-oss-20b$0.1133.54XGrok 4.3$2.194.15