What are tokens?
Tokens are subword units — chunks of text that the model processes internally. They're not characters, not words, but something in between. A token is typically 3–4 characters in English, though this varies by language and content type.
Why tokens instead of words?LLMs use tokenizers (like BPE — Byte Pair Encoding) that break text into statistically optimal chunks. Common words like “the” become single tokens. Rare words get split into multiple tokens. “Unbelievable” might become [“Un”, “believ”, “able”].
Rules of thumb
| Content Type | Approximate Tokens | Characters |
|---|---|---|
| A short email | ~200 | ~800 |
| A page of text | ~500 | ~2,000 |
| A blog post (1,000 words) | ~1,300 | ~5,000 |
| A code file (200 lines) | ~500–800 | ~3,000 |
| A technical document (10 pages) | ~5,000 | ~20,000 |
| A novel (80,000 words) | ~100,000 | ~400,000 |
Quick math: divide your character count by 4 for a rough token estimate. For code, divide by 3.5 (code uses more rare tokens). For non-Latin scripts (Chinese, Japanese, Korean), divide by 2 — each character typically consumes more tokens.
Why tokenizer choice matters
Different models use different tokenizers. The same text produces different token counts across providers. Anthropic, OpenAI, and Google all use their own tokenizers. This means:
- A prompt that costs $0.01 on one model might cost $0.012 on another — even at the same $/MTok price.
- Context window limits (e.g., 128K tokens) hold different amounts of text per model.
- For precise counting, use each provider's tokenizer tool. For estimates, the 4-chars-per-token rule works across all of them.
Input vs output tokens
Every API call has two token counts: input tokens (your prompt, system message, and any context) and output tokens(the model's response). Output tokens cost 2–6x more because they require more compute — the model generates them one at a time.
What 1M tokens costs across models
| Model | $/in | $/out | Blended 1M | Quality |
|---|---|---|---|---|
| Qwen: Qwen3 235B A22B Instruct 2507 | $0.071 | $0.100 | $0.09 | 4.08/5.0 |
| Qwen: Qwen3.5-35B-A3B | $0.163 | $1.30 | $1.02 | 3.92/5.0 |
| GPT-5.5 | $5.00 | $30.00 | $23.75 | 4.46/5.0 |
Calculate your costs
Adjust the sliders to match your expected usage:
Cost calculator
Estimate your monthly cost
Reducing token usage
- Trim your prompts. Remove unnecessary context, examples, and formatting. Every token in your system prompt is charged on every request.
- Set max_tokens. Limit response length when you know a short answer is sufficient. Don't let the model ramble.
- Use prompt caching. Anthropic and OpenAI both offer caching for repeated prompt prefixes, reducing input token costs.
- Summarize before stuffing. Instead of pasting a 10,000-token document into context, summarize it first with a cheap model.
For a deeper dive into pricing strategies, see our LLM API Pricing Explained guide. For full pricing data across all models, see the pricing comparison page.