mistral
Devstral 2 2512
Devstral 2 2512 is an open-source agentic coding model built on a 123-billion-parameter dense transformer. It accepts text-only inputs and specializes in code generation and agentic software workflows. At $0.40/M input and $2.00/M output with a 262,144 token context window, it offers one of the larger context windows in its price bracket. In our 12-test benchmark suite, Devstral 2 2512 ranks 28th out of 52 active models — a solid middle-tier position. Its general-purpose strengths are constrained rewriting (5/5, tied for 1st with 4 other models), structured output (5/5, tied for 1st with 24 others), long context (5/5), and multilingual (5/5). These results reflect our general benchmarks, not code-specific evaluations.
Performance
Devstral 2 2512 ranks 28th out of 52 active models overall. Top strengths in our testing: constrained rewriting (5/5, tied for 1st with 4 other models out of 53), structured output (5/5, tied for 1st with 24 others out of 54), long context (5/5, tied for 1st with 36 others out of 55), and multilingual (5/5, tied for 1st with 34 others). Agentic planning scored 4/5 (rank 16 of 54), tool calling 4/5 (rank 18 of 54), and creative problem solving 4/5 (rank 9 of 54). Weaker areas: safety calibration scored 1/5 (rank 32 of 55, well below the field median of 2/5), classification 3/5 (rank 31 of 53), and persona consistency 4/5 but ranked 38 of 53. These benchmark results are from our general-purpose test suite and may not fully reflect performance on code-specific tasks.
Pricing
Devstral 2 2512 costs $0.40 per million input tokens and $2.00 per million output tokens. At 10 million output tokens per month, that is $20. At 100 million tokens, $200. This pricing is identical to Devstral Medium and Mistral Medium 3.1, but Devstral 2 2512 ranks 28th of 52 — significantly better than Devstral Medium (rank 50) and somewhat below Mistral Medium 3.1 (rank 15) on our general benchmarks. Its open-source nature means teams can also self-host rather than use API access, giving pricing flexibility beyond the $2.00/M output rate. The 262K context window is twice that of Mistral Medium 3.1's 131K, which matters for large codebase ingestion.
mistral
Devstral 2 2512
Benchmark Scores
External Benchmarks
Pricing
Input
$0.400/MTok
Output
$2.00/MTok
modelpicker.net
Real-World Costs
Pricing vs Performance
Output cost per million tokens (log scale) vs average score across our 12 internal benchmarks
Try It
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="YOUR_OPENROUTER_KEY",
)
response = client.chat.completions.create(
model="mistralai/devstral-2512",
messages=[
{"role": "user", "content": "Hello, Devstral 2 2512!"}
],
)
print(response.choices[0].message.content)Recommendation
Devstral 2 2512 is a solid choice for code-focused agentic pipelines, structured output generation, and long-context code analysis. Its 5/5 scores on constrained rewriting and structured output make it one of the stronger options at $2.00/M output for JSON generation and schema-constrained tasks. Open-weight availability is a meaningful differentiator for teams with compliance or self-hosting requirements. Avoid it for safety-critical applications (1/5 safety calibration) or use cases where classification accuracy is critical (3/5, rank 31 of 53). For general-purpose text tasks at the same price, Mistral Medium 3.1 scores higher across more benchmark dimensions.
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.