Ministral 3 8B 2512 vs Mistral Small 3.2 24B

In our testing, Ministral 3 8B 2512 is the better all-around pick — it wins 5 of 12 benchmarks (classification, constrained rewriting, persona consistency, creative problem solving, strategic analysis). Mistral Small 3.2 24B wins agentic planning and may be preferable for input-heavy or instruction-following workloads due to its input/output pricing split and vendor description of improved function calling.

mistral

Ministral 3 8B 2512

Overall
3.67/5Strong

Benchmark Scores

Faithfulness
4/5
Long Context
4/5
Multilingual
4/5
Tool Calling
4/5
Classification
4/5
Agentic Planning
3/5
Structured Output
4/5
Safety Calibration
1/5
Strategic Analysis
3/5
Persona Consistency
5/5
Constrained Rewriting
5/5
Creative Problem Solving
3/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$0.150/MTok

Output

$0.150/MTok

Context Window262K

modelpicker.net

mistral

Mistral Small 3.2 24B

Overall
3.25/5Usable

Benchmark Scores

Faithfulness
4/5
Long Context
4/5
Multilingual
4/5
Tool Calling
4/5
Classification
3/5
Agentic Planning
4/5
Structured Output
4/5
Safety Calibration
1/5
Strategic Analysis
2/5
Persona Consistency
3/5
Constrained Rewriting
4/5
Creative Problem Solving
2/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$0.075/MTok

Output

$0.200/MTok

Context Window128K

modelpicker.net

Benchmark Analysis

Summary of our 12-test suite results (scores are our 1–5 proxies, ranks show position among ~50 models):

  • Classification: Ministral 3 8B 2512 scores 4 vs Mistral Small 3.2 24B's 3. In our testing Ministral is tied for 1st with 29 others out of 53 (strong for routing and accurate categorization).
  • Constrained rewriting: Ministral 5 vs Mistral 4; Ministral is tied for 1st with 4 others (best for tight-length compression and SMS/summary limits).
  • Persona consistency: Ministral 5 vs Mistral 3; Ministral tied for 1st with 36 others (better at maintaining character and resisting injection in our tests).
  • Creative problem solving: Ministral 3 vs Mistral 2; Ministral ranks 30 of 54 vs Mistral 47 of 54 (Ministral gives more non-obvious, feasible ideas in our probes).
  • Strategic analysis: Ministral 3 vs Mistral 2; Ministral ranks 36 of 54 vs Mistral 44 of 54 (Ministral better at nuanced tradeoff reasoning in our scenarios).
  • Agentic planning: Mistral Small wins 4 vs Ministral 3; Mistral ranks 16 of 54 (tied) vs Ministral rank 42 — Mistral is noticeably stronger on goal decomposition and recovery in our tests.
  • Ties (no clear winner in our testing): structured output 4/4 (both rank 26), tool calling 4/4 (both rank 18), faithfulness 4/4 (both rank 34), long context 4/4 (both rank 38), safety calibration 1/1 (both rank 32), multilingual 4/4 (both rank 36). What this means for real tasks: choose Ministral when you need top-tier constrained rewriting, reliable classification/routing, persona stability, and better creative/strategic outputs. Choose Mistral Small when agentic planning (tool sequencing, multi-step goal decomposition) is primary or when lower input cost materially reduces your bill. Both match on tool-calling, long-context retrieval performance, structured output and faithfulness in our suite.
BenchmarkMinistral 3 8B 2512Mistral Small 3.2 24B
Faithfulness4/54/5
Long Context4/54/5
Multilingual4/54/5
Tool Calling4/54/5
Classification4/53/5
Agentic Planning3/54/5
Structured Output4/54/5
Safety Calibration1/51/5
Strategic Analysis3/52/5
Persona Consistency5/53/5
Constrained Rewriting5/54/5
Creative Problem Solving3/52/5
Summary5 wins1 wins

Pricing Analysis

Costs per million input/output tokens (from payload): Ministral 3 8B 2512 = $0.15 input / $0.15 output. Mistral Small 3.2 24B = $0.075 input / $0.20 output. If you count 1M input + 1M output tokens: A costs $0.30, B costs $0.275 (B cheaper by $0.025). At 10M in+out: A $3.00 vs B $2.75 (save $0.25). At 100M in+out: A $30.00 vs B $27.50 (save $2.50). For output-heavy apps (large model replies), Ministral is cheaper per output token ($0.15 vs $0.20) — you save $0.05 per 1M output tokens. For input-heavy apps (lots of embeddings/ingestion or retrieval), Mistral Small is cheaper on input ($0.075 vs $0.15) — you save $0.075 per 1M input tokens. Who should care: high-throughput retrieval/ingestion pipelines and search-indexing teams should favor Mistral Small for lower input costs; chatbots or summarizers that emit large outputs should favor Ministral for lower output cost. The absolute dollar gaps are small at low volumes but scale linearly: the maximum difference shown for a balanced in+out 100M tokens is $2.50/month based on payload rates.

Real-World Cost Comparison

TaskMinistral 3 8B 2512Mistral Small 3.2 24B
iChat response<$0.001<$0.001
iBlog post<$0.001<$0.001
iDocument batch$0.010$0.011
iPipeline run$0.105$0.115

Bottom Line

Choose Ministral 3 8B 2512 if you need: accurate classification and routing, best-in-class constrained rewriting (SMS/character-limited outputs), strong persona consistency, and generally stronger creative and strategic answers — plus the larger 262,144-token context window for very long contexts (payload: 262144 vs 128000). Choose Mistral Small 3.2 24B if you need: stronger agentic planning in our tests, better economics for input-heavy workloads (input $0.075 vs $0.15), or the instruction-following/function-calling improvements noted in its description. If cost is the main factor, pick based on your token profile: Mistral Small reduces input cost; Ministral reduces output cost.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Frequently Asked Questions