Gemma 4 26B A4B
Google's mid-tier model. Context window: 262K tokens.
Scores by test
Methodology →What you need to know
Gemma 4 26B A4B distinguishes itself through high reliability in structured data tasks and long-context processing. With perfect scores in structured output, faithfulness, and tool calling, this model is engineered for precision and adherence to technical schemas. Its 262K context window is backed by a maximum internal score for long-context retrieval, making it a viable option for analyzing large datasets or extensive codebases.
From a cost perspective, the model is priced competitively for its performance tier. At a blended cost of $0.263/MTok, it provides high-end capabilities in strategic analysis and multilingual support without the premium pricing associated with the top-ranked frontier models. It currently ranks 30th out of 71 models, placing it in the upper-middle tier of general utility.
The model has a critical failure in safety calibration, scoring 1/5. This indicates a significant lack of built-in guardrails, meaning developers must implement their own robust filtering and moderation layers. It also shows moderate weakness in constrained rewriting, suggesting it may struggle with strict character or word-count limitations.
Use this model for complex tool-calling pipelines, structured data extraction, and long-document analysis where precision is prioritized over safety. Skip this model if your application requires native safety alignment or highly constrained creative rewriting.
Strengths — Top 3
Relative weaknesses — Bottom 3
Similar models