Question 1

Is Gemini 3.1 Pro Preview better than GPT-5 Mini?

Accepted Answer

It depends on the task. In our testing Gemini wins more internal benchmarks (3 vs 2), specifically creative_problem_solving (5 vs 4), tool_calling (4 vs 3), and agentic_planning (5 vs 4). GPT-5 Mini wins classification (4 vs 2) and safety_calibration (3 vs 2). Many categories tie.

Question 2

Which model is cheaper to run?

Accepted Answer

GPT-5 Mini is substantially cheaper. Per the payload: GPT-5 Mini input $0.25 / output $2 per mTok; Gemini is $2 / $12 per mTok. With a 50/50 input/output split, cost per 1M tokens ≈ $1.125 (GPT-5 Mini) vs $7.00 (Gemini).

Question 3

Which model is better for coding and tool use?

Accepted Answer

For tool selection and sequencing our tests favor Gemini (tool_calling 4 vs 3; Gemini ranks 18 of 54 vs GPT-5 Mini rank 47 of 54). For code-resolution on SWE-bench Verified, GPT-5 Mini posts 64.7% on SWE-bench Verified (Epoch AI) which places it mid-pack on that external measure.

Question 4

Which model is safer or better at refusing harmful requests?

Accepted Answer

GPT-5 Mini scores higher on safety_calibration in our tests (3 vs 2) and ranks 10 of 55 vs Gemini rank 12 of 55, so it more reliably refuses harmful prompts in our scenarios.

Question 5

How do they compare on math and competition tasks?

Accepted Answer

On external benchmarks (Epoch AI), GPT-5 Mini scores 97.8% on MATH Level 5 and 86.7% on AIME 2025; Gemini scores 95.6% on AIME 2025 (Epoch AI). Use GPT-5 Mini for top MATH Level 5 performance and Gemini if the slightly higher AIME 2025 score fits your use case.

Question 6

Who should care most about the price gap?

Accepted Answer

High-volume producers (tens to hundreds of millions of tokens per month) and consumer-facing chat services should care most: at 100M tokens/month (50/50 split), estimated cost ≈ $112.50 for GPT-5 Mini vs ≈ $700 for Gemini, a significant budget difference.

Gemini 3.1 Pro Preview vs GPT-5 Mini

Gemini 3.1 Pro Preview

GPT-5 Mini

Benchmark Analysis

Pricing Analysis

Real-World Cost Comparison

Bottom Line

How We Test

Frequently Asked Questions