Best Open Source LLMs (2026) — Ranked by Benchmarks | ModelPicker

Open source vs open weight

First, the terminology. Most “open source” LLMs are actually open-weight: you can download and run the model weights, but the training data and process aren't fully open. True open-source models (weights + data + training code) are rare.

The license spectrum matters for commercial use:

Apache 2.0 / MIT— Fully permissive. Use commercially without restrictions. Examples: Mistral's smaller models, some Qwen variants.
Llama License— Free for commercial use under 700M monthly active users. Covers Meta's Llama family.
DeepSeek License — Permissive with some restrictions. Check the specific model version.
Research-only — Some model variants are restricted to non-commercial research. Always check before deploying.

Why open weight matters

No vendor lock-in. If your API provider raises prices or shuts down, you can move the same model to another host — or run it yourself.
Fine-tuning. You can train the model further on your data to specialize it for your domain.
Data privacy. Self-hosting means your data never leaves your infrastructure.
Cost at scale. At high volume, self-hosting amortizes to significantly less than API costs.

Top open-weight models

Live data · open-weight models ranked by overall score

#	Model	Provider	Avg Score	$/out	Context
01	Qwen: Qwen3.6 Plus	Qwen	4.54	$1.95	1M
02	R1 0528	DeepSeek	4.46	$2.15	164K
03	DeepSeek V3.2	DeepSeek	4.31	$0.378	131K
04	DeepSeek V4 Flash	DeepSeek	4.23	$0.280	1.0M
05	Mistral Medium 3.1	Mistral	4.23	$2.00	131K
06	DeepSeek V4 Pro	DeepSeek	4.15	$0.870	1.0M
07	Qwen: Qwen3 235B A22B Instruct 2507	Qwen	4.08	$0.100	262K
08	R1	DeepSeek	4.00	$2.50	64K
09	DeepSeek V3.1	DeepSeek	4.00	$0.750	33K
10	Qwen: Qwen3.5-9B	Qwen	4.00	$0.150	262K

Best for specific tasks

Coding: Codestral 2508 (Mistral) leads with a coding composite of 5.00/5.0.

Reasoning: DeepSeek V4 Flash (DeepSeek) leads with a reasoning composite of 5.00/5.0.

General purpose: Qwen: Qwen3.6 Plus (Qwen) has the highest overall score at 4.54/5.0.

How close to proprietary?

The best open-weight model (Qwen: Qwen3.6 Plus, 4.54/5.0) vs the best proprietary model (Claude Sonnet 4.6, 4.69/5.0) — a gap of 0.15 points. The gap has narrowed significantly — for many use cases, open-weight models are now competitive with proprietary ones.

Hosted open-weight options

You don't need a GPU to use open-weight models. Several inference providers host them with generous free tiers:

Groq — Extremely fast inference on custom LPU hardware. Free tier with rate limits. Best for latency-sensitive applications.
Together AI — Wide model selection, competitive pricing. Good for production workloads.
Fireworks AI — Optimized serving with function calling support. Strong developer experience.
OpenRouter — Unified API that routes to multiple providers. Useful for fallback strategies.

Running locally

For local deployment with Ollama, LM Studio, or llama.cpp, see our dedicated Best Local LLMs for Coding guide — it covers hardware requirements, quantization, and tooling in detail.