Gemma 4 26B A4B
Gemma 4 26B A4B is a Mixture-of-Experts (MoE) instruction-tuned model from Google DeepMind. Despite 25.2 billion total parameters, only 3.8 billion activate per token during inference — delivering near-31B quality at a fraction of the compute cost. At $0.08/MTok input and $0.35/MTok output, it is one of the most affordable top-15 models in our benchmark suite, ranking 15th out of 52 with an average score of 4.25. It supports a 262,144 token context window with a matching 262,144 maximum output token limit — an unusually large generation budget. Multimodal inputs (text, image, and video) and configurable reasoning make it versatile across use cases. At the same $0.35/MTok output price point, no other tested model matches its benchmark performance.
Performance
In our 12-test benchmark suite, Gemma 4 26B A4B ranks 15th out of 52 models with an average score of 4.25. Top-tier scores across multiple dimensions: tool calling (5/5, tied for 1st with 16 other models out of 54), strategic analysis (5/5, tied for 1st with 25 others out of 54), faithfulness (5/5, tied for 1st with 32 others out of 55), multilingual (5/5, tied for 1st with 34 others out of 55), long context (5/5, tied for 1st with 36 others out of 55), persona consistency (5/5, tied for 1st with 36 others out of 53), and structured output (5/5, tied for 1st with 24 others out of 54). Classification (4/5, tied for 1st with 29 others) and creative problem solving (4/5, rank 9 of 54) are solid. Agentic planning is 4/5 (rank 16 of 54) — above median. The weaker areas: constrained rewriting (3/5, rank 31 of 53) and safety calibration (1/5, rank 32 of 55 — the lowest safety score in the benchmark, indicating extreme permissiveness).
Pricing
Gemma 4 26B A4B costs $0.08 per million input tokens and $0.35 per million output tokens. At 10 million output tokens monthly, output cost is $3.50. At 100 million output tokens, output cost is $35. The input cost of $0.08/MTok is among the lowest in the tested set. At the same avg score of 4.25, the next cheapest output alternatives are DeepSeek V3.2 ($0.38/MTok) and Gemma 4 31B ($0.38/MTok, avg 4.42) — both slightly more expensive. For teams processing high input volumes (e.g., large document ingestion), Gemma 4 26B A4B's $0.08/MTok input cost makes context-heavy workflows more economical than virtually any other model at comparable performance.
Gemma 4 26B A4B
Benchmark Scores
External Benchmarks
Pricing
Input
$0.080/MTok
Output
$0.350/MTok
modelpicker.net
Real-World Costs
Pricing vs Performance
Output cost per million tokens (log scale) vs average score across our 12 internal benchmarks
Try It
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="YOUR_OPENROUTER_KEY",
)
response = client.chat.completions.create(
model="google/gemma-4-26b-a4b-it",
messages=[
{"role": "user", "content": "Hello, Gemma 4 26B A4B !"}
],
)
print(response.choices[0].message.content)Recommendation
Gemma 4 26B A4B is an outstanding value for teams building document analysis pipelines, multilingual applications, tool-calling workflows, and structured data extraction at minimal cost. The combination of 5/5 tool calling, 5/5 faithfulness, and 5/5 strategic analysis is rare at $0.35/MTok output — most models achieving this profile cost 3x to 40x more. The 262K input and 262K output window enables batch processing of very long documents without chunking. The MoE architecture delivers high-quality outputs with low computational overhead, making it efficient at scale. Key caution: safety calibration of 1/5 (rank 32 of 55) means this model is among the most permissive in our test set. Consumer-facing deployments require external safety layers. For teams needing stronger agentic planning or willing to pay a small premium for higher overall scores, Gemma 4 31B at $0.38/MTok output (avg 4.42) is the recommended alternative within the same provider.
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.