meta
Llama 4 Maverick
Llama 4 Maverick is a high-capacity multimodal model from meta built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward pass. At $0.15 input / $0.60 output per million tokens, it offers one of the largest context windows in the tested field — 1,048,576 tokens — at a very low price. In our 11-benchmark suite (tool calling was rate-limited during testing), it ranked 47th out of 52 models with an average score of 3.36, delivering strong persona consistency but weak strategic analysis and creative problem solving.
Performance
Llama 4 Maverick's strongest benchmark in our testing is persona consistency (5/5, tied for 1st with 36 other models out of 53 tested). All other scores fall at or below the field median: faithfulness 4/5 (rank 34 of 55), multilingual 4/5 (rank 36 of 55), structured output 4/5 (rank 26 of 54), long context 4/5 (rank 38 of 55). Weaknesses include strategic analysis (2/5, rank 44 of 54) and agentic planning (3/5, rank 42 of 54). Tool calling was not scored due to a rate limit error during testing (a transient issue per the model's quirks data). Overall rank: 47 out of 52 tested models. Note: tool calling was not included in the average due to the rate-limit incident.
Pricing
Llama 4 Maverick costs $0.15 per million input tokens and $0.60 per million output tokens. At 1 million output tokens/month, that is $0.60; at 10 million output tokens, $6.00. The 1,048,576-token context window is a standout feature at this price point — most models with million-token contexts cost significantly more. Within the meta lineup, Llama 3.3 70B Instruct costs $0.32/MTok output (avg 3.50) and scores slightly higher on our benchmarks while costing less. For applications that require very long context at low cost and can tolerate below-median general benchmark performance, Llama 4 Maverick's pricing is compelling.
meta
Llama 4 Maverick
Benchmark Scores
External Benchmarks
Pricing
Input
$0.150/MTok
Output
$0.600/MTok
modelpicker.net
Real-World Costs
Pricing vs Performance
Output cost per million tokens (log scale) vs average score across our 12 internal benchmarks
Try It
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="YOUR_OPENROUTER_KEY",
)
response = client.chat.completions.create(
model="meta-llama/llama-4-maverick",
messages=[
{"role": "user", "content": "Hello, Llama 4 Maverick!"}
],
)
print(response.choices[0].message.content)Recommendation
Llama 4 Maverick is a specialized fit for applications that require extremely long context (1M tokens) at the lowest possible price, where persona consistency matters and where strategic analysis and complex reasoning are not critical requirements. It is not recommended as a general-purpose model — its 3.36 average score across 11 benchmarks places it near the bottom of the tested field. For comparable pricing with stronger benchmark performance, Llama 3.3 70B Instruct ($0.32/MTok output, avg 3.50) offers better general results at even lower cost.
How We Test
We test every model against our 12-benchmark suite covering tool calling, agentic planning, creative problem solving, safety calibration, and more. Each test is scored 1–5 by an LLM judge. Read our full methodology.