Codestral 2508 vs Gemini 3 Flash Preview

For most production use cases that need broad reasoning, agentic planning and multilingual strength, Gemini 3 Flash Preview is the winner in our benchmarks. Codestral 2508 is the better pick when cost and low-latency code-centric workloads matter — it is far cheaper but concedes ground on strategic analysis, creative problem solving and persona consistency.

mistral

Codestral 2508

Overall
3.50/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
4/5
Tool Calling
5/5
Classification
3/5
Agentic Planning
4/5
Structured Output
5/5
Safety Calibration
1/5
Strategic Analysis
2/5
Persona Consistency
3/5
Constrained Rewriting
3/5
Creative Problem Solving
2/5

External Benchmarks

SWE-bench Verified
N/A
MATH Level 5
N/A
AIME 2025
N/A

Pricing

Input

$0.300/MTok

Output

$0.900/MTok

Context Window256K

modelpicker.net

google

Gemini 3 Flash Preview

Overall
4.50/5Strong

Benchmark Scores

Faithfulness
5/5
Long Context
5/5
Multilingual
5/5
Tool Calling
5/5
Classification
4/5
Agentic Planning
5/5
Structured Output
5/5
Safety Calibration
1/5
Strategic Analysis
5/5
Persona Consistency
5/5
Constrained Rewriting
4/5
Creative Problem Solving
5/5

External Benchmarks

SWE-bench Verified
75.4%
MATH Level 5
N/A
AIME 2025
92.8%

Pricing

Input

$0.500/MTok

Output

$3.00/MTok

Context Window1049K

modelpicker.net

Benchmark Analysis

Summary: Gemini wins 7 benchmarks, Codestral wins 0, and 5 tests tie. Ties (both 5/5): structured_output (JSON/schema adherence), tool_calling (function selection & sequencing), faithfulness (sticking to source), long_context (30K+ retrieval), and safety_calibration (both scored 1). Gemini wins: strategic_analysis 5 vs Codestral 2 (Gemini ranks tied for 1st in strategic_analysis vs Codestral rank 44 of 54), creative_problem_solving 5 vs 2 (Gemini tied for 1st vs Codestral rank 47 of 54), constrained_rewriting 4 vs 3 (Gemini rank 6 of 53 vs Codestral rank 31), classification 4 vs 3 (Gemini tied for 1st vs Codestral rank 31), persona_consistency 5 vs 3 (Gemini tied for 1st vs Codestral rank 45), agentic_planning 5 vs 4 (Gemini tied for 1st vs Codestral rank 16), and multilingual 5 vs 4 (Gemini tied for 1st vs Codestral rank 36). External benchmarks: Gemini also posts 75.4% on SWE-bench Verified (Epoch AI) and 92.8% on AIME 2025 (Epoch AI); Codestral has no external scores in the payload. What this means in practice: Gemini shows clear superiority in nuanced reasoning, problem ideation, and multilingual/agentic tasks — useful for multi-turn assistants, planning agents and non-English workflows. Codestral matches Gemini on structured outputs, tool-calling and faithfulness while offering a much lower price and very large context (256k) suitable for long code contexts and FIM/code correction.

BenchmarkCodestral 2508Gemini 3 Flash Preview
Faithfulness5/55/5
Long Context5/55/5
Multilingual4/55/5
Tool Calling5/55/5
Classification3/54/5
Agentic Planning4/55/5
Structured Output5/55/5
Safety Calibration1/51/5
Strategic Analysis2/55/5
Persona Consistency3/55/5
Constrained Rewriting3/54/5
Creative Problem Solving2/55/5
Summary0 wins7 wins

Pricing Analysis

Costs shown are per mTok (per 1k tokens). Codestral 2508: input $0.30, output $0.90 per mTok. Gemini 3 Flash Preview: input $0.50, output $3.00 per mTok. Assuming a 50/50 split of input/output tokens: at 1M tokens/month (500 mTok input + 500 mTok output) Codestral totals $600 (500*$0.30 + 500*$0.90), Gemini totals $1,750 (500*$0.50 + 500*$3.00). At 10M tokens/month multiply those by 10: Codestral $6,000 vs Gemini $17,500. At 100M tokens/month: Codestral $60,000 vs Gemini $175,000. Who should care: startups, high-volume API customers, and cost-sensitive teams will see large absolute savings with Codestral; teams that require top-tier reasoning, agentic workflows and multimodal context may justify Gemini's higher spend.

Real-World Cost Comparison

TaskCodestral 2508Gemini 3 Flash Preview
iChat response<$0.001$0.0016
iBlog post$0.0020$0.0063
iDocument batch$0.051$0.160
iPipeline run$0.510$1.60

Bottom Line

Choose Codestral 2508 if you need a cost-efficient, low-latency coding AI that matches Gemini on structured output, tool calling and faithfulness while keeping costs low (output $0.90/mTok). Choose Gemini 3 Flash Preview if your priority is highest-ranked strategic analysis, creative problem solving, agentic planning and multilingual performance (Gemini wins 7/12 benchmarks and posts 75.4% on SWE-bench Verified and 92.8% on AIME 2025). If budget is the constraint for high-volume inference, Codestral is the pragmatic choice; if multi-turn reasoning, multimodal context and best-in-class planning matter, accept Gemini's higher cost.

How We Test

We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.

Frequently Asked Questions