If you've tried to build an AI product in the last year, you've hit the same wall: the best models aren't free, and the free models aren't good enough. That used to be true. The gap has narrowed from “night and day” to “noticeably worse, but survivable for many use cases.”
This guide catalogs the options and ranks them by what matters for production: quality, rate limits, and the terms you're agreeing to.
What “free” actually means
There are four categories and they don't overlap cleanly:
- Free tiers on paid APIs. Google's Gemini API has the most generous one — 1,500 requests per day on Flash. OpenAI's free tier is basically nonexistent for new accounts.
- Open-weight models hosted for free. Groq, Together, and Cerebras host Llama and Qwen variants with generous (but rate-limited) free tiers.
- DIY self-hosted. Free if you already have a GPU. Otherwise you're paying through your cloud bill.
- Introductory credits. Most providers give $5–25 on signup. Not sustainable, but enough to ship a prototype.
| Model | Provider | Avg | Code | $/out | Ctx |
|---|---|---|---|---|---|
| R1 0528 | DeepSeek | 4.50 | — | $2.15 | 164K |
| Gemini 3 Flash Preview | 4.50 | — | $3.00 | 1.0M | |
| Qwen: Qwen3.6 Plus | Qwen | 4.50 | — | $1.95 | 1M |
| Gemini 3.1 Flash Lite Preview | 4.42 | — | $1.50 | 1.0M | |
| Gemma 4 31B | 4.42 | — | $0.38 | 262K | |
| Gemini 3.1 Pro Preview | 4.33 | — | $12.00 | 1.0M | |
| Qwen: Qwen3.5-9B | Qwen | 4.27 | — | $0.15 | 262K |
| Gemini 2.5 Pro | 4.25 | — | $10.00 | 1.0M | |
| DeepSeek V3.2 | DeepSeek | 4.25 | — | $0.38 | 131K |
| Gemma 4 26B A4B | 4.25 | — | $0.34 | 262K | |
| Mistral Medium 3.1 | Mistral | 4.25 | — | $2.00 | 131K |
| Qwen: Qwen3.5-35B-A3B | Qwen | 4.20 | — | $1.30 | 262K |
The rate-limit reality
Free tiers are advertised in requests-per-minute or tokens-per-day. What the marketing pages don't say is how those caps behave under real load. A “1,500 requests/day” quota that cuts you off at 2pm is useless for a shipped product.
In practice, only Google's free tier scales predictably. Groq's free tier is fast but queues aggressively during peak hours. If you're shipping to users, budget for $5–50/month in overflow paid requests from day one.
What you're trading for “free”
Read the terms. Seriously:
- Training on your data. Most free tiers reserve the right to train on your prompts. Paid tiers generally do not.
- No SLA. Downtime is not compensable. You'll get rate-limited the moment the provider needs capacity.
- Region restrictions. Many free tiers block API calls from certain regions or require verification.
Our pick for a real free-tier dev stack
If we were shipping a side-project today and wanted $0 inference costs until ~1,000 DAU:
Primary: Gemini 2.5 Flash via AI Studio free tier. 1,500 req/day covers most prototypes, and it scores well on our benchmarks.
Fallback: Llama 4 Maverick on Groq. Sub-200ms latency for completions where quality isn't the bottleneck.
Escape hatch: Budget $25/month for Claude Haiku or GPT-5 mini. These take over when your free quota runs out.