models/openai/gpt-oss-20b
O
OpenAI·active·free tier available

OpenAI: gpt-oss-20b

OpenAI's efficiency model. Context window: 131K tokens.

Overall score
3.54
/5.00 · ranked #92
Input
$0.030
per 1M tokens
Output
$0.140
per 1M tokens
Context
131K
tokens
Blended
$0.113
3:1 out:in ratio

Price drops, new benchmarks, model updates. Stay current on OpenAI: gpt-oss-20b.

One email per change. Unsubscribe anytime.

modelpicker.aipowered by live benchmark data

Scores by test

Methodology →
Structured Output
5.0
Strategic Analysis
4.0
Constrained Rewriting
3.0
Creative Problem Solving
3.0
Tool Calling
4.0
Faithfulness
4.0
Classification
3.0
Long Context
5.0
Safety Calibration
1.0
Persona Consistency
4.0
Agentic Planning
3.0
Multilingual
4.0
Tabular Data
3.0

What you need to know

The gpt-oss-20b is optimized for high-precision formatting and large-scale data ingestion, achieving top marks in structured output and long context handling. With a 131K context window and a 5/5 score in structured output, it is specifically suited for tasks requiring strict adherence to schemas or the processing of extensive documents without losing coherence.

From a cost perspective, the model is highly economical. With a blended cost of $0.113/MTok, it provides a low-cost entry point for developers who need reliable tool calling and persona consistency without the overhead of frontier-class pricing. However, its overall rank of 59 out of 71 suggests it lacks the general reasoning depth of higher-tier models.

The most significant risk is the model's safety calibration, which scored a 1/5. This indicates a high likelihood of generating unfiltered or unsafe content, requiring developers to implement robust external guardrails. It also shows mediocre performance in classification and constrained rewriting, making it less effective for nuanced linguistic transformations.

Use this model if you need a cheap, high-capacity window for extracting structured data from large files. Skip this model if your application requires strict safety alignment or high-accuracy text classification.

Strengths — Top 3

Structured Output5.0/5.0
Long Context5.0/5.0
Strategic Analysis4.0/5.0

Relative weaknesses — Bottom 3

Safety Calibration1.0/5.0
Constrained Rewriting3.0/5.0
Creative Problem Solving3.0/5.0

Similar models

MLlama 3.3 70B Instruct$0.2653.46MMistral Large 3 2512$1.253.69QQwen: Qwen3 235B A22B Instruct 2507$0.0984.08MMinistral 3 14B 2512$0.2003.77