mistral

Ministral 3 8B 2512

Ministral 3 8B 2512 is an 8-billion-parameter model from mistral, priced at a flat $0.15 per million tokens for both input and output, with a 262,144-token context window and multimodal input support (text and images). In our testing, it ranked 38th out of 52 models overall — a mid-tier result that understates its performance on specific benchmarks. Its standout result is constrained rewriting, where it scored 5/5 and tied for 1st with just 4 other models out of 53 tested — a notably rare achievement. It also scored 5/5 on persona consistency. Within the mistral family, it sits below Ministral 3 14B 2512 (avg 3.75, $0.20/M output) in overall average, but outperforms it on two key benchmarks. For teams whose workflows center on compression, rewriting, and persona-consistent generation, Ministral 3 8B 2512 delivers strong results at a lower price than its larger sibling.

Performance

In our 12-benchmark suite, Ministral 3 8B 2512's top three scores are constrained rewriting, persona consistency, and classification. On constrained rewriting, it scored 5/5, tied for 1st with just 4 other models out of 53 tested — this is the standout result for the model and a genuine differentiator at this price tier. On persona consistency, it scored 5/5, tied for 1st with 36 other models out of 53 tested. On classification, it scored 4/5, tied for 1st with 29 other models out of 53 tested. Additional 4/5 scores cover tool calling (rank 18 of 54), multilingual (rank 36 of 55), structured output (rank 26 of 54), long context (rank 38 of 55), and faithfulness (rank 34 of 55). Weaknesses include agentic planning at 3/5 (rank 42 of 54), strategic analysis at 3/5 (rank 36 of 54), creative problem solving at 3/5 (rank 30 of 54), and safety calibration at 1/5 (rank 32 of 55).

Pricing

At $0.15 per million tokens for both input and output, Ministral 3 8B 2512 is one of the most cost-effective models in the tested set. At 10 million output tokens per month, you'd spend $1.50. At 100 million tokens, $15.00 — significantly cheaper than Ministral 3 14B 2512 ($2.00 per 10M tokens) while still delivering 5/5 scores on two benchmarks. The cheapest model in the tested set is Ministral 3 3B 2512 at $0.10/M output, but it scores lower overall (avg 3.58 vs 3.67 for Ministral 3 8B 2512). Cross-provider, Llama 4 Scout costs $0.30/M output with a lower average score of 3.33 — meaning Ministral 3 8B 2512 offers better performance at half the output cost.

mistral

Ministral 3 8B 2512

Overall
3.67/5Strong

Benchmark Scores

Faithfulness
4/5
Long Context
4/5
Multilingual
4/5
Tool Calling
4/5
Classification
4/5
Agentic Planning
3/5
Structured Output
4/5
Safety Calibration
1/5
Strategic Analysis
3/5
Persona Consistency
5/5
Constrained Rewriting
5/5
Creative Problem Solving
3/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Roles

Pricing

Input

$0.150/MTok

Output

$0.150/MTok

Context Window262K

modelpicker.net

Real-World Costs

iChat response<$0.001
iBlog post<$0.001
iDocument batch$0.010
iPipeline run$0.105

Pricing vs Performance

Output cost per million tokens (log scale) vs average score across our 12 internal benchmarks

This modelOther models

Try It

from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="YOUR_OPENROUTER_KEY",
)

response = client.chat.completions.create(
    model="mistralai/ministral-8b-2512",
    messages=[
        {"role": "user", "content": "Hello, Ministral 3 8B 2512!"}
    ],
)

print(response.choices[0].message.content)

Recommendation

Ministral 3 8B 2512 is the best choice in the ultra-low-cost tier for applications requiring constrained rewriting or persona-consistent generation. Its 5/5 constrained rewriting score — tied for the top among just 5 models total — makes it a strong pick for content compression, rewrites within strict length limits, and editorial pipelines. Its 5/5 persona consistency is valuable for customer service bots, roleplay applications, and any use case requiring the model to maintain a stable voice or character. Classification (4/5) and tool calling (4/5) are usable strengths that round out its general utility. Avoid Ministral 3 8B 2512 for complex agentic workflows — its 3/5 agentic planning score at rank 42 of 54 is below the median. Strategic analysis and creative problem solving are also weak at 3/5. Teams that need the best overall score at this price point should compare with Ministral 3 14B 2512, which costs $0.20/M output but has a higher average score.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.