LLM API Pricing Explained — What You're Actually Paying For | ModelPicker

How token pricing works

Every LLM API charges per token — a subword unit that's roughly 4 characters in English. The key insight: input and output tokens are priced differently. Output tokens (what the model generates) typically cost 2–6x more than input tokens (what you send).

Prices are quoted per million tokens (MTok). When you see “$3/MTok input, $15/MTok output”, that means $3 to send a million tokens and $15 for the model to generate a million tokens back.

Blended cost: the number that actually matters

In practice, you send more tokens than you receive. A typical ratio is 3:1 or 4:1 input-to-output. We use a blended cost formula to account for this:

blended = (input_cost + output_cost × 3) / 4

This weights output cost 3:1 against input, which reflects typical API usage patterns. Use this number when comparing models — it's more useful than looking at input and output separately.

Hidden costs most people miss

Thinking tokens. Reasoning models (o1, o3, Claude with extended thinking) generate internal “thinking” tokens that you pay for as output but never see. This can 2–5x your expected cost.
Retries. Rate limits, timeouts, and malformed responses mean you often call the API 1.2–1.5x more than you planned. Budget for it.
Context window waste. Stuffing a 128K context window with “just in case” context is expensive. A 50K-token system prompt sent 1,000x/day is 50M input tokens — that's $150/day at $3/MTok.
Prompt caching misses. Anthropic and OpenAI offer prompt caching that reduces costs on repeated prefixes. If your caching isn't working, you're paying full price every time.

Price tiers

The market has settled into roughly four price tiers. Here's what each gets you:

Live data · price tiers with top model per tier

Tier	Range	Models	Top Model	Avg Score
Frontier	$10+/MTok output	13	Claude Sonnet 4.6	4.69
Strong	$1–10/MTok	22	MoonshotAI: Kimi K2.6	4.62
Value	$0.10–1/MTok	31	NVIDIA: Nemotron 3 Super	4.46
Budget	< $0.10/MTok	1	Qwen: Qwen3 235B A22B Instruct 2507	4.08

Calculate your costs

Use our calculator to estimate monthly spend based on your usage patterns:

Cost calculator

Estimate your monthly cost

costs per day / month

Input tokens / day

500K tokens

Output tokens / day

150K tokens

ModelQuality$ / day$ / month

OOpenAI: gpt-oss-20b3.54$0.04$1.08

QQwen: Qwen3 235B A22B Instruct 25074.08$0.05$1.51

MMinistral 3 3B 25123.31$0.07$1.95

QQwen: Qwen3.5-9B4.00$0.07$2.17

MMinistral 3 8B 25123.38$0.10$2.93

Cost optimization strategies

Model routing. Use a frontier model for complex tasks and a budget model for simple ones. A router that classifies intent can cut costs 60–80% with minimal quality loss.
Prompt caching. If your system prompt is the same across requests, enable prompt caching. Anthropic caches for 5 minutes; OpenAI caches automatically.
Prompt compression. Trim unnecessary context. A 10K-token prompt that could be 2K is costing you 5x more per request.
Response caching. If you're getting the same questions repeatedly, cache the responses. Redis or even a simple key-value store works.
Batch API. Both Anthropic and OpenAI offer batch APIs with 50% discounts for non-real-time workloads. Use them for data processing, content generation, and analytics.

Full pricing data

See our pricing comparison page for every model's input, output, and blended costs — updated daily from 67 models with pricing data.