openai
GPT-4o-mini
GPT-4o-mini is OpenAI's small, cost-efficient model accepting text, image, and file inputs. At $0.15 input / $0.60 output per million tokens, it is priced at the low end of the tested market — identical to Mistral Small 4's pricing. In our 12-benchmark suite, it ranked 46th out of 52 tested models with an average score of 3.42. It delivers above-median results on safety calibration and classification, but falls well below median on creative problem solving, strategic analysis, and faithfulness. External math benchmarks confirm limited reasoning capability: 52.6 on MATH Level 5 (rank 13 of 14) and 6.9 on AIME 2025 (rank 21 of 23).
Performance
GPT-4o-mini's strongest benchmark in our testing is safety calibration (4/5, rank 6 of 55 — among the top performers in the entire suite). Classification also scored 4/5 (tied for 1st with 29 other models out of 53 tested). Tool calling, multilingual, long context, persona consistency, and structured output all scored 4/5 at mid-tier rankings. Notable weaknesses: creative problem solving scored 2/5 (rank 47 of 54), strategic analysis scored 2/5 (rank 44 of 54), and faithfulness scored 3/5 (rank 52 of 55 — near last place). On external benchmarks, it scored 52.6 on MATH Level 5 (rank 13 of 14) and 6.9 on AIME 2025 (rank 21 of 23), both Epoch AI benchmarks — placing it near the bottom among models with math scores. Overall rank: 46 out of 52 tested models.
Pricing
GPT-4o-mini costs $0.15 per million input tokens and $0.60 per million output tokens. At 1 million output tokens/month, that is $0.60; at 10 million output tokens, $6.00. Within the OpenAI lineup, it is the lowest-priced option we tested — substantially cheaper than GPT-4o ($10/MTok output, avg 3.50). For high-volume inference tasks like classification, extraction, or routing, the cost is highly competitive. The 128,000-token context window and 16,384-token maximum output accommodate most document-length tasks.
openai
GPT-4o-mini
Benchmark Scores
External Benchmarks
Pricing
Input
$0.150/MTok
Output
$0.600/MTok
modelpicker.net
Real-World Costs
Pricing vs Performance
Output cost per million tokens (log scale) vs average score across our 12 internal benchmarks
Try It
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="YOUR_OPENROUTER_KEY",
)
response = client.chat.completions.create(
model="openai/gpt-4o-mini",
messages=[
{"role": "user", "content": "Hello, GPT-4o-mini!"}
],
)
print(response.choices[0].message.content)Recommendation
GPT-4o-mini is the right choice for high-volume, cost-sensitive classification and routing pipelines where safety calibration matters. At $0.60/MTok output with 4/5 on classification and the 6th-best safety calibration score in our suite, it is well-suited for content moderation, input triage, and structured extraction tasks at scale. It is not recommended for reasoning-intensive tasks — MATH Level 5 (52.6, rank 13 of 14) and AIME 2025 (6.9, rank 21 of 23) place it near the bottom of math-capable models. Faithfulness near last place (rank 52 of 55) also rules it out for strict RAG applications. For comparable pricing with better creative and reasoning performance, Mistral Small 4 (avg 3.83, $0.60/MTok output) outscores it on our benchmarks.
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.