mistral
Devstral Medium
Devstral Medium is a code generation and agentic reasoning model. It accepts text-only inputs and targets software development workflows. At $0.40/M input and $2.00/M output with a 131,072 token context window, its pricing matches Mistral Medium 3.1. In our 12-test benchmark suite, Devstral Medium ranks 50th out of 52 active models — near the bottom overall. This reflects our benchmark composition: our tests favor general reasoning, multilingual performance, and safety calibration. Devstral Medium is a specialized coding model, so these results should be interpreted in that context. Classification (4/5, tied for 1st) and agentic planning (4/5, rank 16 of 54) are its strongest areas in our testing.
Performance
In our 12-test general-purpose benchmark suite, Devstral Medium ranks 50th out of 52 active models. Its strongest areas are classification (4/5, tied for 1st with 29 other models out of 53), agentic planning (4/5, rank 16 of 54), faithfulness (4/5, rank 34 of 55), and structured output (4/5, rank 26 of 54). Weaker areas include tool calling (3/5, rank 47 of 54), creative problem solving (2/5, rank 47 of 54), strategic analysis (2/5, rank 44 of 54), and safety calibration (1/5, rank 32 of 55). Persona consistency scored 3/5 (rank 45 of 53) and constrained rewriting 3/5 (rank 31 of 53). These results reflect our general-purpose test suite, not code-specific evaluations.
Pricing
Devstral Medium costs $0.40 per million input tokens and $2.00 per million output tokens. At 10 million output tokens per month, that is $20. At 100 million tokens, $200. At the same $2.00/M output price, Mistral Medium 3.1 scores significantly higher on our general-purpose benchmarks (rank 15 vs rank 50 of 52). For teams where general-purpose benchmark performance matters, Mistral Medium 3.1 offers more breadth at the same cost. For code-specific workflows, Devstral Medium's positioning as a specialized code model may justify the trade-off.
mistral
Devstral Medium
Benchmark Scores
External Benchmarks
Pricing
Input
$0.400/MTok
Output
$2.00/MTok
modelpicker.net
Real-World Costs
Pricing vs Performance
Output cost per million tokens (log scale) vs average score across our 12 internal benchmarks
Try It
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="YOUR_OPENROUTER_KEY",
)
response = client.chat.completions.create(
model="mistralai/devstral-medium",
messages=[
{"role": "user", "content": "Hello, Devstral Medium!"}
],
)
print(response.choices[0].message.content)Recommendation
Devstral Medium is positioned for code generation and coding agent use cases. Our general-purpose benchmarks show it at rank 50 of 52, so teams evaluating it for general text tasks should look at higher-ranking options. If your workflow is specifically code-focused — particularly around agentic coding assistants or code review pipelines — its classification (4/5) and agentic planning (4/5) scores are more relevant. For the same $2.00/M output price, Mistral Medium 3.1 delivers significantly broader performance across our benchmark suite.
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.