Devstral Medium
Mistral's efficiency model. Context window: 131K tokens.
Scores by test
Methodology →What you need to know
Devstral Medium is best suited for technical execution tasks requiring high reliability in output format and data integrity. It performs strongly in structured output, faithfulness, and agentic planning, making it a capable choice for pipeline automation and classification tasks. Its 131K context window is supported by a high long-context score, ensuring it can handle large datasets without significant degradation in accuracy.
The model struggles with high-level cognitive tasks, specifically strategic analysis and creative problem solving. It also shows a critical weakness in safety calibration, scoring 1/5, which indicates a lack of robust guardrails. Developers should expect poor performance when the use case requires nuanced reasoning or strict content filtering.
At a blended cost of $1.60/MTok, the model is priced moderately, but its low overall rank (#69 of 71) suggests poor value relative to the current market. While the input costs are low, the performance trade-offs in reasoning and safety make it an expensive option for general-purpose intelligence.
Use this model if you need a reliable tool for structured data extraction, classification, or agentic workflows within a large context. Skip this model if your application requires creative synthesis, complex strategic planning, or strict safety compliance.
Strengths — Top 3
Relative weaknesses — Bottom 3
Similar models