guides/llm-api-pricing11 min read

LLM API pricing explained — what you're actually paying for

Token pricing looks simple until you're trying to forecast a $10K/month AI bill. Here's how it actually works, where the hidden costs are, and how to optimize.

How token pricing works

Every LLM API charges per token — a subword unit that's roughly 4 characters in English. The key insight: input and output tokens are priced differently. Output tokens (what the model generates) typically cost 2–6x more than input tokens (what you send).

Prices are quoted per million tokens (MTok). When you see “$3/MTok input, $15/MTok output”, that means $3 to send a million tokens and $15 for the model to generate a million tokens back.

Blended cost: the number that actually matters

In practice, you send more tokens than you receive. A typical ratio is 3:1 or 4:1 input-to-output. We use a blended cost formula to account for this:

blended = (input_cost + output_cost × 3) / 4

This weights output cost 3:1 against input, which reflects typical API usage patterns. Use this number when comparing models — it's more useful than looking at input and output separately.

Hidden costs most people miss

  • Thinking tokens. Reasoning models (o1, o3, Claude with extended thinking) generate internal “thinking” tokens that you pay for as output but never see. This can 2–5x your expected cost.
  • Retries. Rate limits, timeouts, and malformed responses mean you often call the API 1.2–1.5x more than you planned. Budget for it.
  • Context window waste. Stuffing a 128K context window with “just in case” context is expensive. A 50K-token system prompt sent 1,000x/day is 50M input tokens — that's $150/day at $3/MTok.
  • Prompt caching misses. Anthropic and OpenAI offer prompt caching that reduces costs on repeated prefixes. If your caching isn't working, you're paying full price every time.

Price tiers

The market has settled into roughly four price tiers. Here's what each gets you:

Live data · price tiers with top model per tier
TierRangeModelsTop ModelAvg Score
Frontier$10+/MTok output13Claude Sonnet 4.64.69
Strong$1–10/MTok22MoonshotAI: Kimi K2.64.62
Value$0.10–1/MTok31NVIDIA: Nemotron 3 Super4.46
Budget< $0.10/MTok1Qwen: Qwen3 235B A22B Instruct 25074.08

Calculate your costs

Use our calculator to estimate monthly spend based on your usage patterns:

Cost calculator

Estimate your monthly cost

costs per day / month
500K tokens
150K tokens
ModelQuality$ / day$ / month
OOpenAI: gpt-oss-20b3.54$0.04$1.08
MMinistral 3 3B 25123.31$0.07$1.95
QQwen: Qwen3.5-9B4.00$0.07$2.17
MMinistral 3 8B 25123.38$0.10$2.93

Cost optimization strategies

  1. Model routing. Use a frontier model for complex tasks and a budget model for simple ones. A router that classifies intent can cut costs 60–80% with minimal quality loss.
  2. Prompt caching. If your system prompt is the same across requests, enable prompt caching. Anthropic caches for 5 minutes; OpenAI caches automatically.
  3. Prompt compression. Trim unnecessary context. A 10K-token prompt that could be 2K is costing you 5x more per request.
  4. Response caching. If you're getting the same questions repeatedly, cache the responses. Redis or even a simple key-value store works.
  5. Batch API. Both Anthropic and OpenAI offer batch APIs with 50% discounts for non-real-time workloads. Use them for data processing, content generation, and analytics.

Full pricing data

See our pricing comparison page for every model's input, output, and blended costs — updated daily from 67 models with pricing data.