What Is Context Window?

The maximum number of tokens an LLM can process in a single request, including both input and output.

The context window determines how much text you can send to a model in one go. A 128K context window means you can include about 96,000 words of context. A 1M context window handles roughly 750,000 words, enough to process entire books or large codebases.

Larger context windows don't always mean better performance. Many models degrade in quality when the context gets very long, particularly for retrieval tasks where the answer is buried deep in the input. This is why benchmarks test 'long context retrieval' specifically.

Some providers charge more for longer contexts. GPT-5.4 uses tiered pricing: base rates apply up to 272K tokens, then 2x input / 1.5x output for longer contexts. Always check whether the model you're considering has tiered pricing before budgeting for long-context workloads.

Related Pages

Related Terms