Ministral 3 3B 2512 vs Mistral Small 3.2 24B

Winner for most common use cases: Ministral 3 3B 2512 — it wins 5 of 12 benchmarks and is materially cheaper on many workloads. Mistral Small 3.2 24B wins the one benchmark where agentic planning matters (agentic planning 4 vs 3) and is worth considering when goal decomposition and failure recovery are primary requirements — but expect higher output costs.

mistral

Ministral 3 3B 2512

Overall
3.58/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
4/5
Multilingual
4/5
Tool Calling
4/5
Classification
4/5
Agentic Planning
3/5
Structured Output
4/5
Safety Calibration
1/5
Strategic Analysis
2/5
Persona Consistency
4/5
Constrained Rewriting
5/5
Creative Problem Solving
3/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$0.100/MTok

Output

$0.100/MTok

Context Window131K

modelpicker.net

mistral

Mistral Small 3.2 24B

Overall
3.25/5Usable

Benchmark Scores

Faithfulness
4/5
Long Context
4/5
Multilingual
4/5
Tool Calling
4/5
Classification
3/5
Agentic Planning
4/5
Structured Output
4/5
Safety Calibration
1/5
Strategic Analysis
2/5
Persona Consistency
3/5
Constrained Rewriting
4/5
Creative Problem Solving
2/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$0.075/MTok

Output

$0.200/MTok

Context Window128K

modelpicker.net

Benchmark Analysis

Across our 12-test suite, Ministral 3 3B 2512 wins 5 benchmarks, Mistral Small 3.2 24B wins 1, and 6 are ties. Detailed walk-through: - Faithfulness: Ministral 3 3B 2512 scores 5 vs 4 and is tied for 1st (rank 1 of 55, tied with 32 models). This matters for tasks needing strict adherence to source material (contracts, citations). - Constrained_rewriting: 5 vs 4 for Ministral 3 3B 2512 (tied for 1st with 4 others) — better for compression into hard limits (SMS, UI snippets). - Classification: 4 vs 3 for Ministral 3 3B 2512 (tied for 1st with 29 others) — more reliable routing and labeling. - Creative_problem_solving: 3 vs 2 for Ministral 3 3B 2512 (rank 30 of 54 vs B rank 47) — A generates more feasible, non-obvious ideas in our tests. - Persona_consistency: 4 vs 3 for Ministral 3 3B 2512 (rank 38 vs B rank 45) — A better resists injection and keeps tone/character. - Agentic_planning: Mistral Small 3.2 24B wins 4 vs 3 and ranks substantially better (B rank 16 of 54 vs A rank 42); pick B when goal decomposition and recovery matter (agents, multi-step orchestration). - Ties (no clear winner in our tests): structured output 4/4 (JSON/schema tasks), tool calling 4/4 (function selection & arguments), long context 4/4 (30k+ retrieval), strategic analysis 2/2 (nuanced tradeoffs), safety calibration 1/1 (refusal/permissiveness), multilingual 4/4. Practical interpretation: Ministral 3 3B 2512 is the stronger option when you need faithfulness, constrained rewriting, classification, and more creative problem solving per token — and it does so at materially lower per-token cost. Mistral Small 3.2 24B stands out when agentic planning quality (rank 16 of 54) is decisive despite higher output pricing.

BenchmarkMinistral 3 3B 2512Mistral Small 3.2 24B
Faithfulness5/54/5
Long Context4/54/5
Multilingual4/54/5
Tool Calling4/54/5
Classification4/53/5
Agentic Planning3/54/5
Structured Output4/54/5
Safety Calibration1/51/5
Strategic Analysis2/52/5
Persona Consistency4/53/5
Constrained Rewriting5/54/5
Creative Problem Solving3/52/5
Summary5 wins1 wins

Pricing Analysis

Per-token rates from the payload: Ministral 3 3B 2512 charges $0.10 per mTok for input and $0.10 per mTok for output. Mistral Small 3.2 24B charges $0.075 per mTok input and $0.20 per mTok output. Using a 50/50 input/output split as an example: for 1M total tokens (500k input + 500k output) Ministral 3 3B 2512 costs $100 (500 * $0.10 + 500 * $0.10 = $50+$50). Mistral Small 3.2 24B costs $137.50 (500 * $0.075 + 500 * $0.20 = $37.50+$100). Scale: at 10M tokens/month those totals scale to $1,000 vs $1,375; at 100M tokens/month $10,000 vs $13,750. Who should care: startups, high-volume API deployments, and embedded products will see five-figure monthly differences at 10M+ tokens; teams that generate large outputs (long generations, transcripts, batch inference) should pay attention to Mistral Small 3.2 24B’s higher output rate ($0.20/mTok). The payload’s priceRatio (0.5) reflects that Ministral 3 3B 2512 is effectively half-cost in many costing comparisons.

Real-World Cost Comparison

TaskMinistral 3 3B 2512Mistral Small 3.2 24B
iChat response<$0.001<$0.001
iBlog post<$0.001<$0.001
iDocument batch$0.0070$0.011
iPipeline run$0.070$0.115

Bottom Line

Choose Ministral 3 3B 2512 if you need a cost-efficient general-purpose model with best-in-class faithfulness and constrained rewriting (choosing it saves hundreds to thousands of dollars per month at scale). Use cases: production chat assistants with tight content fidelity, classification/routing systems, SMS/UX-limited rewriting, and image->text tasks where long context is required (context window 131072). Choose Mistral Small 3.2 24B if agentic planning and multi-step orchestration are central (agentic planning 4 vs 3 and B ranks 16 of 54) and you can absorb higher output costs ($0.20 per mTok). Use cases: agent frameworks, automated workflows that require robust goal decomposition and failure recovery.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Frequently Asked Questions