guides/free-llm-api12 min read

Free LLM APIs for developers, ranked by what you actually get

"Free" means a lot of things in LLM-land — generous free tiers, open weights you can self-host, bring-your-own-compute credits. Here's what's actually free, what it costs you in practice, and which free model is worth building on.

If you've tried to build an AI product in the last year, you've hit the same wall: the best models aren't free, and the free models aren't good enough. That used to be true. The gap has narrowed from “night and day” to “noticeably worse, but survivable for many use cases.”

This guide catalogs the options and ranks them by what matters for production: quality, rate limits, and the terms you're agreeing to.

What “free” actually means

There are four categories and they don't overlap cleanly:

  1. Free tiers on paid APIs. Google's Gemini API has the most generous one — 1,500 requests per day on Flash. OpenAI's free tier is basically nonexistent for new accounts.
  2. Open-weight models hosted for free. Groq, Together, and Cerebras host Llama and Qwen variants with generous (but rate-limited) free tiers.
  3. DIY self-hosted. Free if you already have a GPU. Otherwise you're paying through your cloud bill.
  4. Introductory credits. Most providers give $5–25 on signup. Not sustainable, but enough to ship a prototype.
Live data · free tier or sub-$0.30/MTok
ModelProviderAvgCode$/outCtx
R1 0528DeepSeek4.50$2.15164K
Gemini 3 Flash PreviewGoogle4.50$3.001.0M
Qwen: Qwen3.6 PlusQwen4.50$1.951M
Gemini 3.1 Flash Lite PreviewGoogle4.42$1.501.0M
Gemma 4 31BGoogle4.42$0.38262K
Gemini 3.1 Pro PreviewGoogle4.33$12.001.0M
Qwen: Qwen3.5-9BQwen4.27$0.15262K
Gemini 2.5 ProGoogle4.25$10.001.0M
DeepSeek V3.2DeepSeek4.25$0.38131K
Gemma 4 26B A4B Google4.25$0.34262K
Mistral Medium 3.1Mistral4.25$2.00131K
Qwen: Qwen3.5-35B-A3BQwen4.20$1.30262K

The rate-limit reality

Free tiers are advertised in requests-per-minute or tokens-per-day. What the marketing pages don't say is how those caps behave under real load. A “1,500 requests/day” quota that cuts you off at 2pm is useless for a shipped product.

In practice, only Google's free tier scales predictably. Groq's free tier is fast but queues aggressively during peak hours. If you're shipping to users, budget for $5–50/month in overflow paid requests from day one.

What you're trading for “free”

Read the terms. Seriously:

  • Training on your data. Most free tiers reserve the right to train on your prompts. Paid tiers generally do not.
  • No SLA. Downtime is not compensable. You'll get rate-limited the moment the provider needs capacity.
  • Region restrictions. Many free tiers block API calls from certain regions or require verification.

Our pick for a real free-tier dev stack

If we were shipping a side-project today and wanted $0 inference costs until ~1,000 DAU:

Primary: Gemini 2.5 Flash via AI Studio free tier. 1,500 req/day covers most prototypes, and it scores well on our benchmarks.

Fallback: Llama 4 Maverick on Groq. Sub-200ms latency for completions where quality isn't the bottleneck.

Escape hatch: Budget $25/month for Claude Haiku or GPT-5 mini. These take over when your free quota runs out.