A context window is the maximum amount of text, measured in tokens, that a language model can process in a single call. It covers both the input you send (system prompt, instructions, retrieved documents, conversation history) and the output the model generates. A token is a sub-word unit, so 1,000 tokens is roughly 750 English words. When a request exceeds the window, the oldest or least relevant content has to be dropped or summarized.

The context window is the model’s working memory for one request. Anything outside it does not exist as far as that call is concerned, which is why long-term memory and retrieval exist: they decide what to load into the window at the right moment.

How large context windows are in 2026

Windows have grown by orders of magnitude. Many frontier model lines now offer a 1,000,000-token window, and Google’s Gemini 2.5 Pro documents an input limit of 1,048,576 tokens with a 65,536-token output limit. Anthropic documents 1M-token windows on several current Claude models, with most models at 200,000 tokens.

A 1M-token window holds roughly 750,000 words, about ten average novels. That is enough to drop an entire codebase or a long deposition into a single prompt.

Bigger is not automatically better

A large window is a ceiling, not a target. Two well-documented effects show that accuracy can fall as you fill the window:

  • Lost in the middle: Liu et al. found a U-shaped curve where models use information best when it sits at the start or end of the input and worse when it sits in the middle, even on long-context models.
  • Context rot: a Chroma study across 18 models found performance degrades non-uniformly as input length grows, even on simple retrieval tasks. Anthropic’s own documentation adopts the term and describes context as “a finite resource with diminishing marginal returns.”

The practical takeaway: what you put in the window matters as much as how much fits. Curating the window is the discipline of context engineering .

How the window relates to cost

You pay per token, so a larger filled window costs more on every call and adds latency. Prompt caching reduces the cost of reusing a long, stable prefix, and retrieval keeps the window focused instead of stuffing everything in.

Further reading