mistral

Mistral Small 3.2 24B

Mistral Small 3.2 24B is a 24-billion-parameter model from mistral, priced at $0.075 per million input tokens and $0.20 per million output tokens, with a 128,000-token context window. In our testing across 12 benchmarks, it ranked 49th out of 52 models overall — placing it at the lower end of the tested set. Within the mistral family, it scores below Ministral 3 14B 2512 (avg 3.75, $0.20/M output) despite being a larger model by parameter count, and below Mistral Small 4 (avg 3.83, $0.60/M output). Its closest price competitor is Ministral 3 14B 2512, which offers better benchmark performance at an equivalent output cost. Where Mistral Small 3.2 24B has clear value is in agentic planning (4/5, rank 16 of 54) and constrained rewriting (4/5, rank 6 of 53) — two areas where it performs above expectation given its overall rank.

Performance

In our 12-benchmark suite, Mistral Small 3.2 24B's strongest areas are constrained rewriting, tool calling, and agentic planning. On constrained rewriting, it scored 4/5 at rank 6 of 53 — a top-tier result for compression tasks. On tool calling, it scored 4/5 at rank 18 of 54, demonstrating reliable function selection and argument accuracy. On agentic planning, it scored 4/5 at rank 16 of 54, which is above median for goal decomposition and failure recovery. Additional 4/5 scores on multilingual (rank 36 of 55), structured output (rank 26 of 54), long context (rank 38 of 55), and faithfulness (rank 34 of 55) round out a mixed-but-decent middle tier on those dimensions. The weaknesses are notable: creative problem solving scored 2/5 at rank 47 of 54 — near the bottom of the tested set — and strategic analysis scored 2/5 at rank 44 of 54. Safety calibration scored 1/5 at rank 32 of 55. Overall, the model ranks 49 of 52, meaning its low scores on those three dimensions pull the average down significantly.

Pricing

At $0.075/M input and $0.20/M output, Mistral Small 3.2 24B is among the cheaper models in the tested set. At 10 million output tokens per month, the output cost is $2.00. At 100 million tokens, $20.00. The flat output pricing is the same as Ministral 3 14B 2512 ($0.20/M), which outscores it overall (avg 3.75 vs 3.25). Compared to Mistral Small 4 at $0.60/M output, Mistral Small 3.2 24B is meaningfully cheaper but scores lower (3.25 vs 3.83 average). For teams that need the specific strengths of this model — particularly agentic planning at low cost — the pricing is attractive, but purely cost-per-score comparisons within the mistral family favor Ministral 3 14B 2512.

mistral

Mistral Small 3.2 24B

Overall
3.25/5Usable

Benchmark Scores

Faithfulness
4/5
Long Context
4/5
Multilingual
4/5
Tool Calling
4/5
Classification
3/5
Agentic Planning
4/5
Structured Output
4/5
Safety Calibration
1/5
Strategic Analysis
2/5
Persona Consistency
3/5
Constrained Rewriting
4/5
Creative Problem Solving
2/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$0.075/MTok

Output

$0.200/MTok

Context Window128K

modelpicker.net

Real-World Costs

iChat response<$0.001
iBlog post<$0.001
iDocument batch$0.011
iPipeline run$0.115

Pricing vs Performance

Output cost per million tokens (log scale) vs average score across our 12 internal benchmarks

This modelOther models

Try It

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="YOUR_OPENROUTER_KEY",
)

response = client.chat.completions.create(
    model="mistralai/mistral-small-3.2-24b-instruct",
    messages=[
        {"role": "user", "content": "Hello, Mistral Small 3.2 24B!"}
    ],
)

print(response.choices[0].message.content)

Recommendation

Mistral Small 3.2 24B is a reasonable pick for teams with narrow needs in agentic automation and tool calling at minimal cost. Its 4/5 agentic planning score (rank 16 of 54) and 4/5 tool calling score (rank 18 of 54) make it capable for constrained agent workflows where budget is tight. Constrained rewriting tasks — compressing text within character limits — also benefit from its rank 6 score on that benchmark. Avoid this model for creative ideation, strategic analysis, or any use case requiring nuanced reasoning: its 2/5 scores on creative problem solving (rank 47 of 54) and strategic analysis (rank 44 of 54) place it near the bottom of the tested set on those dimensions. Teams comparing it directly to Ministral 3 14B 2512 at the same output price should benchmark both — Ministral 3 14B 2512 scores higher overall, but Mistral Small 3.2 24B has a stronger agentic planning score.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.