mistral
Ministral 3 14B 2512
Ministral 3 14B 2512 is the largest model in the Ministral 3 family from mistral, priced at a flat $0.20 per million tokens for both input and output. It accepts text and image inputs and supports a 262,144-token context window. According to the model's description, it offers frontier capabilities and performance comparable to Mistral Small 3.2 24B — a larger model. In our testing, Ministral 3 14B 2512 ranked 36th out of 52 models overall, placing it comfortably in the mid-tier. It outperforms Ministral 3 8B 2512 (avg 3.67, $0.15/M output), Ministral 3 3B 2512 (avg 3.58, $0.10/M output), Mistral Small 3.2 24B (avg 3.25, $0.20/M output), and Mistral Large 3 2512 (avg 3.67, $1.50/M output) — at a price point that makes it attractive for budget-conscious deployments needing more than entry-level capability.
Performance
In our 12-benchmark suite, Ministral 3 14B 2512's three strongest areas are persona consistency, constrained rewriting, and classification. On persona consistency, it scored 5/5, tied for 1st with 36 other models out of 53 tested — maintaining character and resisting injection attacks reliably. On constrained rewriting, it scored 4/5 at rank 6 of 53 (25 models share this score), placing it in the top tier for compression within hard character limits. On classification, it scored 4/5, tied for 1st with 29 other models out of 53 tested. Additional strengths include strategic analysis (4/5, rank 27 of 54), creative problem solving (4/5, rank 9 of 54), tool calling (4/5, rank 18 of 54), and faithfulness (4/5, rank 34 of 55). The main weakness is safety calibration — a score of 1/5 at rank 32 of 55. Agentic planning also trails at 3/5, rank 42 of 54.
Pricing
At $0.20 per million tokens for both input and output, Ministral 3 14B 2512 offers symmetric pricing that's easy to budget. At 10 million output tokens per month, you'd spend $2.00. At 100 million tokens, $20.00. Compare this to Mistral Small 3.2 24B, which costs approximately $0.20/M output but scores lower (avg 3.25 vs 3.75 for Ministral 3 14B 2512). Mistral Large 3 2512 costs $1.50/M output and scores only 3.67 on average — meaning Ministral 3 14B 2512 delivers more value per dollar in this comparison. Cross-provider, Gemma 4 26B A4B scores 4.25 at $0.35/M output, and Ministral 3 8B 2512 scores 3.67 at $0.15/M output — if pure cost optimization is the goal, the 8B is cheaper, but the 14B outscores it overall.
mistral
Ministral 3 14B 2512
Benchmark Scores
External Benchmarks
Pricing
Input
$0.200/MTok
Output
$0.200/MTok
modelpicker.net
Real-World Costs
Pricing vs Performance
Output cost per million tokens (log scale) vs average score across our 12 internal benchmarks
Try It
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="YOUR_OPENROUTER_KEY",
)
response = client.chat.completions.create(
model="mistralai/ministral-14b-2512",
messages=[
{"role": "user", "content": "Hello, Ministral 3 14B 2512!"}
],
)
print(response.choices[0].message.content)Recommendation
Ministral 3 14B 2512 is a good fit for teams that need reliable persona maintenance, structured content generation, and classification at a low cost. Its 5/5 persona consistency score makes it well-suited for character-driven applications, customer service bots that must stay in role, and content pipelines requiring a consistent voice. Its 4/5 constrained rewriting score (rank 6) is useful for content compression tasks — summaries, rewrites within character limits. For multilingual and long-context tasks, it scores 4/5 on both. Avoid this model for agentic pipelines — its 3/5 agentic planning score at rank 42 of 54 is below the median. Teams that need strong safety calibration should also look elsewhere: its 1/5 safety calibration score is among the weakest in the tested set. For higher general performance with more budget, Mistral Medium 3.1 scores 4.25 average at $2.00/M output.
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.