mistral
Mistral Small 3.1 24B
Mistral Small 3.1 24B is an upgraded variant of the Mistral Small 3 family with 24 billion parameters and multimodal capability (text and image inputs). At $0.35 input / $0.56 output per million tokens, it is modestly priced — but in our 12-benchmark suite, it ranked 52nd out of 52 active models with an average score of 2.92. It does not support tool calling, which significantly limits its utility for agentic workflows. Its strongest result was long context (5/5), but most other benchmarks fell at or below the field median, with several near the bottom of the ranked field.
Performance
Mistral Small 3.1 24B's only standout score in our testing is long context (5/5, tied for 1st with 36 other models out of 55 tested). All other benchmarks fall below the field median. Tool calling scored 1/5 (rank 53 of 54 — near last). Persona consistency scored 2/5 (rank 51 of 53). Creative problem solving scored 2/5 (rank 47 of 54). Safety calibration scored 1/5 (rank 32 of 55). Faithfulness scored 4/5 (rank 34 of 55) and multilingual scored 4/5 (rank 36 of 55) — both mid-tier. Overall rank: 52 out of 52 tested active models. The model also lacks tool calling support per documented quirks, making the tool calling score of 1/5 consistent with its capability profile.
Pricing
Mistral Small 3.1 24B costs $0.35 per million input tokens and $0.56 per million output tokens. At 1 million output tokens/month, that is $0.56; at 10 million output tokens, $5.60. While the price is low, the newer Mistral Small 4 (avg 3.83, $0.60/MTok output) offers substantially better benchmark performance at just $0.04 more per MTok output. Ministral 3 14B 2512 ($0.20/MTok output, avg 3.75) and Ministral 3 8B 2512 ($0.15/MTok output, avg 3.67) both score higher at lower prices. In the current mistral model lineup, Mistral Small 3.1 24B is outperformed by several newer models at comparable or lower cost.
mistral
Mistral Small 3.1 24B
Benchmark Scores
External Benchmarks
Pricing
Input
$0.350/MTok
Output
$0.560/MTok
modelpicker.net
Real-World Costs
Pricing vs Performance
Output cost per million tokens (log scale) vs average score across our 12 internal benchmarks
Try It
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="YOUR_OPENROUTER_KEY",
)
response = client.chat.completions.create(
model="mistralai/mistral-small-3.1-24b-instruct",
messages=[
{"role": "user", "content": "Hello, Mistral Small 3.1 24B!"}
],
)
print(response.choices[0].message.content)Recommendation
Mistral Small 3.1 24B is not recommended for most use cases in its current state. It ranks last among active models in our 12-test suite, does not support tool calling, and scores near the bottom on persona consistency and creative problem solving. The only scenario where it has a clear use case is very long-context retrieval (5/5 on long context), but even there, newer models in the same price range perform better overall. Teams evaluating affordable mistral models should prioritize Mistral Small 4 (avg 3.83, $0.60/MTok output) or Ministral 3 14B 2512 (avg 3.75, $0.20/MTok output) instead.
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.