Ministral 3 8B 2512 vs Mistral Small 3.2 24B
In our testing, Ministral 3 8B 2512 is the better all-around pick — it wins 5 of 12 benchmarks (classification, constrained rewriting, persona consistency, creative problem solving, strategic analysis). Mistral Small 3.2 24B wins agentic planning and may be preferable for input-heavy or instruction-following workloads due to its input/output pricing split and vendor description of improved function calling.
mistral
Ministral 3 8B 2512
Benchmark Scores
External Benchmarks
Pricing
Input
$0.150/MTok
Output
$0.150/MTok
modelpicker.net
mistral
Mistral Small 3.2 24B
Benchmark Scores
External Benchmarks
Pricing
Input
$0.075/MTok
Output
$0.200/MTok
modelpicker.net
Benchmark Analysis
Summary of our 12-test suite results (scores are our 1–5 proxies, ranks show position among ~50 models):
- Classification: Ministral 3 8B 2512 scores 4 vs Mistral Small 3.2 24B's 3. In our testing Ministral is tied for 1st with 29 others out of 53 (strong for routing and accurate categorization).
- Constrained rewriting: Ministral 5 vs Mistral 4; Ministral is tied for 1st with 4 others (best for tight-length compression and SMS/summary limits).
- Persona consistency: Ministral 5 vs Mistral 3; Ministral tied for 1st with 36 others (better at maintaining character and resisting injection in our tests).
- Creative problem solving: Ministral 3 vs Mistral 2; Ministral ranks 30 of 54 vs Mistral 47 of 54 (Ministral gives more non-obvious, feasible ideas in our probes).
- Strategic analysis: Ministral 3 vs Mistral 2; Ministral ranks 36 of 54 vs Mistral 44 of 54 (Ministral better at nuanced tradeoff reasoning in our scenarios).
- Agentic planning: Mistral Small wins 4 vs Ministral 3; Mistral ranks 16 of 54 (tied) vs Ministral rank 42 — Mistral is noticeably stronger on goal decomposition and recovery in our tests.
- Ties (no clear winner in our testing): structured output 4/4 (both rank 26), tool calling 4/4 (both rank 18), faithfulness 4/4 (both rank 34), long context 4/4 (both rank 38), safety calibration 1/1 (both rank 32), multilingual 4/4 (both rank 36). What this means for real tasks: choose Ministral when you need top-tier constrained rewriting, reliable classification/routing, persona stability, and better creative/strategic outputs. Choose Mistral Small when agentic planning (tool sequencing, multi-step goal decomposition) is primary or when lower input cost materially reduces your bill. Both match on tool-calling, long-context retrieval performance, structured output and faithfulness in our suite.
Pricing Analysis
Costs per million input/output tokens (from payload): Ministral 3 8B 2512 = $0.15 input / $0.15 output. Mistral Small 3.2 24B = $0.075 input / $0.20 output. If you count 1M input + 1M output tokens: A costs $0.30, B costs $0.275 (B cheaper by $0.025). At 10M in+out: A $3.00 vs B $2.75 (save $0.25). At 100M in+out: A $30.00 vs B $27.50 (save $2.50). For output-heavy apps (large model replies), Ministral is cheaper per output token ($0.15 vs $0.20) — you save $0.05 per 1M output tokens. For input-heavy apps (lots of embeddings/ingestion or retrieval), Mistral Small is cheaper on input ($0.075 vs $0.15) — you save $0.075 per 1M input tokens. Who should care: high-throughput retrieval/ingestion pipelines and search-indexing teams should favor Mistral Small for lower input costs; chatbots or summarizers that emit large outputs should favor Ministral for lower output cost. The absolute dollar gaps are small at low volumes but scale linearly: the maximum difference shown for a balanced in+out 100M tokens is $2.50/month based on payload rates.
Real-World Cost Comparison
Bottom Line
Choose Ministral 3 8B 2512 if you need: accurate classification and routing, best-in-class constrained rewriting (SMS/character-limited outputs), strong persona consistency, and generally stronger creative and strategic answers — plus the larger 262,144-token context window for very long contexts (payload: 262144 vs 128000). Choose Mistral Small 3.2 24B if you need: stronger agentic planning in our tests, better economics for input-heavy workloads (input $0.075 vs $0.15), or the instruction-following/function-calling improvements noted in its description. If cost is the main factor, pick based on your token profile: Mistral Small reduces input cost; Ministral reduces output cost.
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.