mistral

Ministral 3 14B 2512

Ministral 3 14B 2512 is the largest model in the Ministral 3 family from mistral, priced at a flat $0.20 per million tokens for both input and output. It accepts text and image inputs and supports a 262,144-token context window. According to the model's description, it offers frontier capabilities and performance comparable to Mistral Small 3.2 24B — a larger model. In our testing, Ministral 3 14B 2512 ranked 36th out of 52 models overall, placing it comfortably in the mid-tier. It outperforms Ministral 3 8B 2512 (avg 3.67, $0.15/M output), Ministral 3 3B 2512 (avg 3.58, $0.10/M output), Mistral Small 3.2 24B (avg 3.25, $0.20/M output), and Mistral Large 3 2512 (avg 3.67, $1.50/M output) — at a price point that makes it attractive for budget-conscious deployments needing more than entry-level capability.

Performance

In our 12-benchmark suite, Ministral 3 14B 2512's three strongest areas are persona consistency, constrained rewriting, and classification. On persona consistency, it scored 5/5, tied for 1st with 36 other models out of 53 tested — maintaining character and resisting injection attacks reliably. On constrained rewriting, it scored 4/5 at rank 6 of 53 (25 models share this score), placing it in the top tier for compression within hard character limits. On classification, it scored 4/5, tied for 1st with 29 other models out of 53 tested. Additional strengths include strategic analysis (4/5, rank 27 of 54), creative problem solving (4/5, rank 9 of 54), tool calling (4/5, rank 18 of 54), and faithfulness (4/5, rank 34 of 55). The main weakness is safety calibration — a score of 1/5 at rank 32 of 55. Agentic planning also trails at 3/5, rank 42 of 54.

Pricing

At $0.20 per million tokens for both input and output, Ministral 3 14B 2512 offers symmetric pricing that's easy to budget. At 10 million output tokens per month, you'd spend $2.00. At 100 million tokens, $20.00. Compare this to Mistral Small 3.2 24B, which costs approximately $0.20/M output but scores lower (avg 3.25 vs 3.75 for Ministral 3 14B 2512). Mistral Large 3 2512 costs $1.50/M output and scores only 3.67 on average — meaning Ministral 3 14B 2512 delivers more value per dollar in this comparison. Cross-provider, Gemma 4 26B A4B scores 4.25 at $0.35/M output, and Ministral 3 8B 2512 scores 3.67 at $0.15/M output — if pure cost optimization is the goal, the 8B is cheaper, but the 14B outscores it overall.

mistral

Ministral 3 14B 2512

Overall
3.75/5Strong

Benchmark Scores

Faithfulness
4/5
Long Context
4/5
Multilingual
4/5
Tool Calling
4/5
Classification
4/5
Agentic Planning
3/5
Structured Output
4/5
Safety Calibration
1/5
Strategic Analysis
4/5
Persona Consistency
5/5
Constrained Rewriting
4/5
Creative Problem Solving
4/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$0.200/MTok

Output

$0.200/MTok

Context Window262K

modelpicker.net

Real-World Costs

iChat response<$0.001
iBlog post<$0.001
iDocument batch$0.014
iPipeline run$0.140

Pricing vs Performance

Output cost per million tokens (log scale) vs average score across our 12 internal benchmarks

This modelOther models

Try It

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="YOUR_OPENROUTER_KEY",
)

response = client.chat.completions.create(
    model="mistralai/ministral-14b-2512",
    messages=[
        {"role": "user", "content": "Hello, Ministral 3 14B 2512!"}
    ],
)

print(response.choices[0].message.content)

Recommendation

Ministral 3 14B 2512 is a good fit for teams that need reliable persona maintenance, structured content generation, and classification at a low cost. Its 5/5 persona consistency score makes it well-suited for character-driven applications, customer service bots that must stay in role, and content pipelines requiring a consistent voice. Its 4/5 constrained rewriting score (rank 6) is useful for content compression tasks — summaries, rewrites within character limits. For multilingual and long-context tasks, it scores 4/5 on both. Avoid this model for agentic pipelines — its 3/5 agentic planning score at rank 42 of 54 is below the median. Teams that need strong safety calibration should also look elsewhere: its 1/5 safety calibration score is among the weakest in the tested set. For higher general performance with more budget, Mistral Medium 3.1 scores 4.25 average at $2.00/M output.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.