Mistral Small 4
Mistral's efficiency model. Context window: 262K tokens.
Scores by test
Methodology →What you need to know
Mistral Small 4 is optimized for high-precision formatting and multilingual deployment. It achieves perfect scores in structured output, multilingual capabilities, and persona consistency, making it a reliable choice for applications requiring strict schema adherence or consistent brand voice across different languages.
The model provides a massive 262K context window at a low price point, with a blended cost of $0.487/MTok. This makes it a cost-effective option for processing large documents, though its overall performance rank (#52 of 71) suggests it is a utility model rather than a frontier-class reasoning engine.
Performance is inconsistent across logic tasks. While it handles agentic planning and strategic analysis well, it struggles significantly with basic classification and safety calibration. Developers should expect poor results when using this model for sentiment analysis, labeling tasks, or environments requiring strict safety guardrails.
Use this model if you need an affordable, long-context engine for generating structured data or maintaining a specific persona in multiple languages. Skip this model if your primary use case is data classification or if you require high safety calibration.
Strengths — Top 3
Relative weaknesses — Bottom 3
Similar models