Question 1

Is Devstral 2 2512 better than Gemini 3 Flash Preview?

Accepted Answer

In our 12-test suite Gemini 3 Flash Preview wins 7 tests while Devstral 2 2512 wins 1 (constrained_rewriting). Gemini outperforms Devstral on tool_calling, agentic_planning, strategic_analysis, faithfulness, classification, persona_consistency, and creative_problem_solving. Devstral is the better pick only when constrained_rewriting or lower cost is the priority.

Question 2

Which model is cheaper to run?

Accepted Answer

Devstral 2 2512 is cheaper: input $0.40/M and output $2.00/M vs Gemini's $0.50/M input and $3.00/M output. With a 50/50 input/output split that’s $1.20/M for Devstral vs $1.75/M for Gemini (Gemini about $0.55 more per M).

Question 3

Which model is better for coding and tool workflows?

Accepted Answer

Gemini 3 Flash Preview — it scores 5 on tool_calling and is tied for 1st on that test in our rankings ("tied for 1st with 16 other models"). Gemini also scores 5 on agentic_planning and 5 on faithfulness in our tests, and it posts 75.4% on SWE-bench Verified (Epoch AI), supporting its coding/tooling strength.

Question 4

How do they compare on long context and structured outputs?

Accepted Answer

They tie. Both models score 5 on long_context and structured_output in our tests; Devstral supports a 262,144 token window and Gemini supports 1,048,576 tokens. Both are excellent at JSON/schema compliance and retrieval across long contexts in our benchmarks.

Question 5

Are there safety differences between the two?

Accepted Answer

No meaningful difference in our tests: both models score 1 on safety_calibration and rank 32 of 55 (24 models share that score). In our suite neither model reliably balances refusing harmful requests and permitting legitimate ones.

Question 6

What do the external benchmarks say?

Accepted Answer

Gemini 3 Flash Preview posts external scores in the payload: 75.4% on SWE-bench Verified (Epoch AI) and 92.8% on AIME 2025 (Epoch AI). Those external results supplement our internal scores and point to Gemini's strength on coding and competitive math tasks; Devstral has no external scores in the provided data.

Devstral 2 2512 vs Gemini 3 Flash Preview

Devstral 2 2512

Gemini 3 Flash Preview

Benchmark Analysis

Pricing Analysis

Real-World Cost Comparison

Bottom Line

How We Test

Frequently Asked Questions