Skip to main content
March 14, 2026

Claude 1M Context Now GA: What It Costs and Why It Changes Everything

Anthropic just made 1M context generally available for Claude Opus 4.6 and Sonnet 4.6 at standard pricing — no long-context premium. Here's what it actually costs per request, how it compares to Gemini and GPT-5.2, and when you should (and shouldn't) fill the window.

anthropicclaudecontext-windowpricing-newscost-analysis2026
Claude 1M Context Now GA: What It Costs and Why It Changes Everything

Anthropic dropped a significant pricing change yesterday: the full 1M token context window is now generally available for both Claude Opus 4.6 and Claude Sonnet 4.6, with no long-context surcharge. A 900K-token request costs exactly the same per-token rate as a 9K one. No beta headers, no special flags, no premium tier — just send your tokens and pay standard rates.

This matters because most providers have historically charged a premium for long-context usage, or gated it behind special access programs. Anthropic is saying the quiet part loud: large context windows shouldn't cost extra. The model either handles it or it doesn't, and the price should reflect the per-token rate you already agreed to.

In this breakdown, we'll calculate exactly what filling that 1M window costs for both Claude models, compare it against every competitor offering million-token contexts, and help you decide when stuffing the full window makes financial sense versus when you're just burning money.

The new pricing reality

Here's what Anthropic announced on March 13, 2026: Claude Opus 4.6 and Claude Sonnet 4.6 now support the full 1M context window at their standard API rates. No multiplier, no tiered pricing based on context length.

[stat] $0/premium The long-context surcharge for Claude's 1M window — there isn't one

The standard rates remain:

Model Input (per 1M tokens) Output (per 1M tokens) Context Window
Claude Opus 4.6 $5.00 $25.00 1,000,000
Claude Sonnet 4.6 $3.00 $15.00 1,000,000

Previously, Opus 4.6 was listed at 200K context, with 1M available only through a beta header. Sonnet 4.6 already had the 1M window but required the header for requests over 200K tokens. Now both models handle the full window natively — no code changes needed.

Additional changes that shipped alongside GA:

  • 600 images or PDF pages per request (up from 100) — a 6x increase in media throughput
  • Full rate limits at every context length — no throttling for large requests
  • Claude Code support — Max, Team, and Enterprise users get 1M context in Opus sessions automatically, meaning fewer compaction events

💡 Key Takeaway: If you're already using Claude's API with the beta header for long context, you can remove it. Everything works the same, and you were never paying a premium anyway. The real news is that the 200K soft cap is gone entirely.


What a full 1M context request actually costs

Let's do the math that matters: how much does it cost to fill the entire window?

Claude Opus 4.6 — full window cost

  • Input (1M tokens): $5.00
  • Output (assume 4K tokens): $0.10
  • Total per request: ~$5.10

If you're making 100 full-context requests per day, that's $510/day or roughly $15,300/month in input costs alone.

Claude Sonnet 4.6 — full window cost

  • Input (1M tokens): $3.00
  • Output (assume 4K tokens): $0.06
  • Total per request: ~$3.06

The same 100 requests/day comes to $306/day or about $9,180/month.

📊 Quick Math: A single full-context Opus 4.6 request costs $5.10. That's the price of a latte for 750 pages of analysis processed in one shot. Whether that's cheap or expensive depends entirely on whether you actually need all 1M tokens.

$3.06
Sonnet 4.6 full 1M request
vs
$5.10
Opus 4.6 full 1M request

How Claude's 1M pricing stacks up against competitors

Claude isn't the only model with a million-token context window. Here's how the full landscape looks in March 2026:

Model Provider Context Window Input/1M tokens Output/1M tokens Full Window Input Cost
Claude Opus 4.6 Anthropic 1,000,000 $5.00 $25.00 $5.00
Claude Sonnet 4.6 Anthropic 1,000,000 $3.00 $15.00 $3.00
GPT-5.2 OpenAI 1,000,000 $1.75 $14.00 $1.75
Gemini 3.1 Pro Google 1,000,000 $2.00 $12.00 $2.00
Gemini 3 Flash Google 1,000,000 $0.50 $3.00 $0.50
Grok 4.1 Fast xAI 2,000,000 $0.20 $0.50 $0.20
Llama 4 Maverick Meta/Together 1,000,000 $0.27 $0.85 $0.27

On raw input cost alone, Claude Opus 4.6 is the most expensive way to fill a million-token window among the current 1M+ models. Grok 4.1 Fast fills twice the context (2M tokens) for $0.40 total input — that's 25x cheaper than Opus for double the window.

But pricing isn't the whole story. The question is whether cheaper models can actually use that context effectively.

⚠️ Warning: Don't choose a long-context model on price alone. A model that loses track of information at 500K tokens isn't saving you money — it's giving you wrong answers at a discount. Always benchmark recall accuracy for your specific use case.


Context quality: where Anthropic justifies the premium

Anthropic's key claim is that Opus 4.6 scores 78.3% on MRCR v2 (Multi-Round Coreference Resolution), the highest among frontier models at the 1M context length. This benchmark tests whether a model can retrieve specific details from deep within a long context — the "needle in a haystack" problem at production scale.

This matters because a context window is only useful if the model can actually recall what's in it. A 1M window where the model forgets everything past 200K tokens is functionally a 200K window that you're overpaying for.

Here's the practical breakdown of what this means:

Use cases where 1M context quality matters most

Codebase analysis. Loading an entire repository (200-500 files) into context and asking the model to find bugs, suggest refactors, or understand cross-file dependencies. If the model loses track of files loaded early in the context, it'll miss critical connections.

Legal document review. Multi-hundred-page contracts, depositions, and case files where a detail on page 3 contradicts a claim on page 847. Partial recall means missed contradictions.

Long-running agent sessions. Claude Code and similar coding agents that accumulate tool calls, observations, and intermediate reasoning over hours of work. Without full recall, the agent starts repeating searches and losing track of what it already tried.

Research synthesis. Loading hundreds of papers or datasets and asking for cross-referencing, meta-analysis, or pattern detection across the full corpus.

💡 Key Takeaway: Anthropic is betting that developers will pay a 2-3x premium for context quality. If your use case requires faithful recall at 500K+ tokens, that premium may be justified. If you're just searching for keywords in documents, cheaper models work fine.


The real cost comparison: cost per useful context token

Raw per-token pricing tells only part of the story. What matters is cost per token where the model actually performs well. Let's reframe the comparison.

If Opus 4.6 maintains 78% recall accuracy at 1M tokens, and a competitor maintains only 50% recall at the same length, you effectively need to re-run or chunk your requests with the competitor — doubling or tripling your actual cost.

Scenario Opus 4.6 (1 pass) Cheaper model (needs 2 passes)
500K token document analysis $2.50 input $0.50 × 2 = $1.00 input
1M token codebase review $5.00 input $0.50 × 4 chunks = $2.00 input
Agent session (800K accumulated) $4.00 input $0.50 × 3 re-runs = $1.50 input

Even with chunking overhead, the cheaper models still win on pure cost in most scenarios. The tradeoff is developer time and accuracy, not raw API spend. Chunking a legal document across multiple requests means writing code to split, process, and merge results — and hoping nothing falls through the gaps between chunks.

✅ TL;DR: Claude Opus 4.6 costs 3-25x more than alternatives for filling a 1M window, but its superior recall accuracy means fewer failed passes and more reliable results on complex, recall-heavy tasks. For simple retrieval or search, cheaper models are the better deal.


When to use the full 1M window (and when not to)

Use the full window when:

  1. Your task requires cross-referencing across the entire corpus. Legal review, audit compliance, security analysis where any missed connection is a failure.
  2. You're running long-lived agents. Claude Code sessions, research agents, or any workflow that accumulates state over hours. Compaction (summarizing and discarding old context) loses information.
  3. The cost of a wrong answer exceeds the cost of the API call. If a missed detail in a contract review costs your client $100K, spending $5 on a full-context analysis is cheap insurance.
  4. You're processing multimedia at scale. The new 600-image/PDF limit means you can analyze entire document sets in one pass.

Don't fill the window when:

  1. RAG can handle it. If your documents can be chunked and retrieved via embeddings, you'll spend $0.02-0.10 per query instead of $3-5. Use our cost optimization guide for RAG pricing.
  2. Your task is localized. Summarizing a single document, answering a factual question, or generating content rarely needs more than 50K tokens of context.
  3. You're making high-volume, repetitive calls. Filling 1M tokens 1,000 times per day costs $3,000-5,000 daily. At that scale, even small context reductions compound massively.
  4. Your context is mostly padding. Sending 800K tokens of system prompts and conversation history for a simple question is just waste. Trim aggressively.

📊 Quick Math: Switching from full 1M context to a well-tuned RAG pipeline can reduce per-query costs by 98%. A query that costs $5.00 with full context might cost $0.10 with retrieval — and for many tasks, the accuracy difference is negligible.


Impact on Claude Code users

The GA announcement specifically highlights Claude Code improvements. For Max, Team, and Enterprise subscribers, Opus 4.6 sessions now automatically use the 1M window. The practical impact:

Fewer compaction events. Previously, Claude Code would hit context limits and compress older parts of the conversation, losing details about earlier tool calls, file reads, and reasoning steps. With 1M context, a session can hold significantly more history before compaction triggers.

Better debugging loops. Developers reported that pre-1M, complex debugging sessions would "go in circles" — the agent would re-search files it had already read because the earlier results were compacted away. Full context retention fixes this.

Cost implications for teams. Claude Code Max subscriptions absorb API costs, so the 1M window doesn't directly increase your bill. But if you're using Claude Code via API (not through a subscription), be aware that longer sessions mean higher per-session costs. A 2-hour coding session that accumulates 600K tokens of context costs $3.00 in Opus input per completion at that context length.

For a deeper comparison of coding assistant costs, see our AI coding assistant cost breakdown.


Should you update your models.json context window?

If you're building applications that reference Claude's context limits, update your configurations:

  • Claude Opus 4.6: 200,000 → 1,000,000
  • Claude Sonnet 4.6: Already listed at 1,000,000 (no change)
  • Media limit: 100 → 600 images/PDFs per request

No API changes are needed. If you were sending the anthropic-beta: max-tokens-3-5-sonnet-2025-04-14 header (or similar), it's now ignored — but leaving it in won't break anything.

💡 Key Takeaway: This is a backward-compatible change. Existing code works. The only action item is updating any hard-coded context limits in your application logic to take advantage of the full window.


The competitive landscape: who's winning the context war

The race for context window size has largely settled. Most frontier models now offer 1M+ tokens, and two (Gemini 3 Pro and Grok 4.1 Fast) offer 2M. The competition has shifted from "who has the biggest window" to two new battlegrounds:

1. Context quality at scale. Can the model actually use all those tokens? Anthropic is leading here with the MRCR v2 benchmark, but independent evaluations from developers vary. Google's Gemini 3.1 Pro also performs well on long-context tasks, and at $2.00 per million input tokens, it offers a compelling middle ground.

2. Pricing per useful token. Grok 4.1 Fast at $0.20/MTok and Gemini 3 Flash at $0.50/MTok are making the budget argument: for routine long-context tasks, you don't need frontier quality. Good enough is good enough, and 10-25x cheaper.

For developers, the optimal strategy is increasingly model routing — using expensive models like Opus 4.6 only for tasks that require superior recall, and routing everything else to budget models. Check out our model routing guide for implementation strategies.

$0.20/MTok
Grok 4.1 Fast (2M context)
vs
$5.00/MTok
Claude Opus 4.6 (1M context)

Frequently asked questions

How much does it cost to fill Claude's 1M context window?

Claude Opus 4.6 costs $5.00 in input tokens to fill the full 1M window. Claude Sonnet 4.6 costs $3.00. Output costs are additional — at typical 4K token outputs, add $0.10 (Opus) or $0.06 (Sonnet) per request. There's no long-context surcharge; these are the standard per-token rates applied uniformly regardless of context length.

Is Claude's 1M context window more expensive than GPT-5.2 or Gemini?

Yes. Claude Opus 4.6 at $5.00/MTok input is the most expensive 1M-context model currently available. GPT-5.2 costs $1.75/MTok (65% cheaper), Gemini 3.1 Pro costs $2.00/MTok (60% cheaper), and Gemini 3 Flash costs just $0.50/MTok (90% cheaper). However, Opus 4.6 leads on context recall accuracy benchmarks, which may justify the premium for recall-intensive tasks. Use the AI cost calculator to compare costs for your specific usage pattern.

Do I need to change my API code to use Claude's 1M context?

No. If you were using the beta header for long-context requests, it's now ignored (but won't cause errors). Requests over 200K tokens work automatically. The only change you might want to make is removing any hard-coded 200K context limits in your application logic and updating media limits from 100 to 600 images/PDFs per request.

When should I use 1M context instead of RAG?

Use full context when your task requires cross-referencing information across the entire document set — legal review, codebase analysis, or long-running agent sessions. Use RAG when your queries are localized (searching for specific facts), when you're making high-volume requests (RAG costs 98% less per query), or when your documents exceed 1M tokens total. For most production applications, a hybrid approach works best: RAG for retrieval, full context for synthesis. Read our RAG cost guide for detailed comparisons.

Does Claude Code's 1M context cost extra?

For Claude Code Max, Team, and Enterprise subscribers, the 1M context window is included in the subscription — no additional per-token charges for Opus 4.6 sessions. If you're using Claude Code through the API directly (not via subscription), standard per-token rates apply, and longer sessions will cost more as context accumulates. A typical 2-hour session might accumulate 400-800K tokens of context.


Bottom line: who should care about this announcement

If you're already using Claude's long context beta: Remove the header, update your context limits, and enjoy the 600-image media limit. Your costs don't change.

If you've been waiting for stable long-context support: This is the green light. GA means full rate limits, no beta flags, and a commitment from Anthropic that this is a production-ready feature.

If you're cost-sensitive: Claude's 1M window is the most expensive option on the market. Gemini 3.1 Pro offers comparable quality at 60% less, and Gemini 3 Flash or Grok 4.1 Fast are 90-96% cheaper for tasks where frontier recall isn't critical.

If context quality is non-negotiable: Opus 4.6's 78.3% MRCR v2 score is currently the best in class. For legal, compliance, security audit, and complex codebase analysis, the premium pays for itself in avoided errors.

The best approach for most teams: route to Claude Opus 4.6 for recall-critical tasks, use Gemini 3.1 Pro or GPT-5.2 as your default, and drop to Flash/Grok for bulk processing. Use our calculator to model costs for your specific mix.