Document summarization is one of the most practical AI use cases in production today. Law firms process contracts. Analysts digest earnings reports. Research teams condense papers. Product teams summarize user feedback. Every one of these workflows has a token cost — and at scale, picking the wrong model can burn thousands of dollars per month.
This guide gives you the exact per-page and per-document costs for summarizing different document types across every major AI provider in 2026. Every price comes from current API rates in the AI Cost Calculator.
[stat] $0.0001 vs $0.058 Cost per page: Gemini 2.0 Flash vs Claude Opus 4.6 — a 580× difference for the same summarization task
How document summarization tokens work
Before we get to costs, you need to understand the token math. A document goes in as input tokens, and the summary comes out as output tokens. The ratio between them determines your cost profile.
Typical token counts by document type:
| Document type | Pages | Input tokens | Summary output tokens | Input:Output ratio |
|---|---|---|---|---|
| Email thread | 1-2 | 500-1,000 | 100-200 | 5:1 |
| Meeting transcript | 5-10 | 3,000-6,000 | 300-600 | 10:1 |
| Business report | 10-20 | 4,000-10,000 | 400-800 | ~12:1 |
| Legal contract | 20-50 | 10,000-25,000 | 800-1,500 | ~15:1 |
| Research paper | 15-30 | 8,000-15,000 | 500-1,000 | ~15:1 |
| Technical manual | 50-100 | 25,000-50,000 | 1,000-2,000 | ~25:1 |
| Full book | 200-400 | 80,000-160,000 | 1,500-3,000 | ~50:1 |
The key insight: summarization is massively input-heavy. You're feeding thousands of tokens and getting back hundreds. This means input pricing matters far more than output pricing for this workload — the opposite of chatbot economics.
💡 Key Takeaway: For summarization, always compare models on input price first. A model with cheap output but expensive input will cost you more than one with balanced pricing.
Per-page summarization costs by model
Here's what it costs to summarize a single page (~450 input tokens, ~50 output tokens for a brief summary) across current models:
| Model | Input $/M | Output $/M | Cost per page | Cost per 100 pages |
|---|---|---|---|---|
| Gemini 2.0 Flash | $0.10 | $0.40 | $0.0001 | $0.006 |
| GPT-5.4 nano | $0.20 | $1.25 | $0.0002 | $0.015 |
| DeepSeek V3.2 | $0.28 | $0.42 | $0.0001 | $0.015 |
| Gemini 2.5 Flash | $0.30 | $2.50 | $0.0003 | $0.026 |
| Mistral Small 4 | $0.15 | $0.60 | $0.0001 | $0.010 |
| GPT-5.4 mini | $0.75 | $4.50 | $0.0006 | $0.056 |
| Gemini 3.1 Pro | $2.00 | $12.00 | $0.0015 | $0.150 |
| GPT-5.4 | $2.50 | $15.00 | $0.0019 | $0.186 |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $0.0021 | $0.210 |
| Claude Opus 4.6 | $5.00 | $25.00 | $0.0035 | $0.350 |
At the per-page level, the differences look tiny. But summarization is a volume game. When you're processing hundreds or thousands of documents per month, these fractions compound fast.
Real-world document summarization scenarios
Let's model four actual use cases with realistic document sizes and see what they cost monthly.
Scenario 1: Legal contract review (law firm)
Workload: 200 contracts/month, average 30 pages each, detailed summary with key clause extraction.
- Input tokens per contract: ~15,000
- Output tokens per contract: ~1,200 (detailed summary + clause list)
- Monthly input: 3,000,000 tokens
- Monthly output: 240,000 tokens
| Model | Monthly cost | Cost per contract |
|---|---|---|
| Gemini 2.0 Flash | $0.40 | $0.002 |
| DeepSeek V3.2 | $0.94 | $0.005 |
| GPT-5.4 mini | $3.33 | $0.017 |
| GPT-5.4 | $11.10 | $0.056 |
| Claude Sonnet 4.6 | $12.60 | $0.063 |
| Claude Opus 4.6 | $21.00 | $0.105 |
📊 Quick Math: A law firm processing 200 contracts/month saves $20,060 annually choosing DeepSeek V3.2 over Claude Opus 4.6. The question is whether quality differences justify the 22× price premium.
Scenario 2: Earnings report analysis (finance team)
Workload: 50 quarterly earnings reports, 80 pages each, with financial data extraction and sentiment.
- Input tokens per report: ~40,000
- Output tokens per report: ~2,000 (summary + key metrics + sentiment)
- Monthly input: 2,000,000 tokens
- Monthly output: 100,000 tokens
| Model | Monthly cost | Cost per report |
|---|---|---|
| Gemini 2.0 Flash | $0.24 | $0.005 |
| DeepSeek V3.2 | $0.60 | $0.012 |
| GPT-5.4 mini | $1.95 | $0.039 |
| Gemini 3.1 Pro | $5.20 | $0.104 |
| GPT-5.4 | $6.50 | $0.130 |
| Claude Opus 4.6 | $12.50 | $0.250 |
For finance, accuracy matters more than raw cost. A flagship model that correctly identifies a revenue miss is worth the premium over a budget model that glosses over it. The smart play: run budget models for the initial pass, then route flagged reports to GPT-5.4 or Claude for deeper analysis.
Scenario 3: Research paper digests (academic team)
Workload: 500 papers/month, 20 pages average, short abstract-style summaries.
- Input tokens per paper: ~10,000
- Output tokens per paper: ~300 (concise summary)
- Monthly input: 5,000,000 tokens
- Monthly output: 150,000 tokens
| Model | Monthly cost | Cost per paper |
|---|---|---|
| Gemini 2.0 Flash | $0.56 | $0.001 |
| DeepSeek V3.2 | $1.46 | $0.003 |
| GPT-5.4 nano | $1.19 | $0.002 |
| Mistral Small 4 | $0.84 | $0.002 |
| GPT-5.4 mini | $4.43 | $0.009 |
| Claude Haiku 4.5 | $5.75 | $0.012 |
✅ TL;DR: For short summaries of research papers, budget models deliver excellent results at under $2/month for 500 papers. Save flagship models for papers that need deep analysis or cross-referencing.
Scenario 4: Book summarization service (SaaS product)
Workload: 1,000 books/month, 300 pages average, chapter-by-chapter summaries + key takeaways.
- Input tokens per book: ~135,000
- Output tokens per book: ~3,000 (chapter summaries + takeaways)
- Monthly input: 135,000,000 tokens
- Monthly output: 3,000,000 tokens
| Model | Monthly cost | Cost per book |
|---|---|---|
| Gemini 2.0 Flash | $14.70 | $0.015 |
| DeepSeek V3.2 | $39.06 | $0.039 |
| GPT-5.4 nano | $30.75 | $0.031 |
| Gemini 2.5 Flash | $48.00 | $0.048 |
| GPT-5.4 mini | $114.75 | $0.115 |
| GPT-5.4 | $382.50 | $0.383 |
| Claude Opus 4.6 | $750.00 | $0.750 |
[stat] $750 vs $15 Monthly cost of summarizing 1,000 books: Claude Opus 4.6 vs Gemini 2.0 Flash — 50× difference
If you're building a book summary SaaS, this cost spread is your entire margin. At $0.015 per book on Gemini 2.0 Flash, you can charge $1/summary and keep 98.5% margin. At $0.75/book on Claude Opus, your margin drops to 25% at the same price point.
Context window requirements for summarization
Not every model can handle every document in a single pass. Here's what fits where:
| Context window | Models | Max document size | Fits |
|---|---|---|---|
| 128K tokens | GPT-4o, Mistral models, DeepSeek | ~250 pages | Most reports, contracts, papers |
| 200K tokens | Claude models, GPT-4.1 | ~400 pages | Long manuals, short books |
| 500K-1M tokens | GPT-5.x, Claude Opus 4.6, Gemini 3 Flash | ~800-2,000 pages | Full books, legal compilations |
| 2M tokens | Gemini 3 Pro, o4-mini, Grok 4.20 | ~4,000 pages | Multi-book analysis, massive codebases |
For documents that exceed the context window, you need a chunk-and-merge strategy: split the document into overlapping chunks, summarize each chunk separately, then pass all chunk summaries into a final consolidation call. This adds one extra API call but works with any model size.
⚠️ Warning: Chunking adds latency and can lose cross-section context. If a contract's indemnity clause in Section 12 references definitions in Section 2, a chunk boundary between them will miss the connection. For documents where cross-referencing matters, pay for a model with a large enough context window to process the whole thing at once.
For a deeper dive on large context pricing, see our 2 million token context window cost comparison.
Prompt caching: the summarization cost killer
If you're summarizing documents that share structure — like quarterly reports from the same company, or contracts using the same template — prompt caching slashes your costs dramatically.
How it works: The first request pays full input price. Subsequent requests with the same prefix (system prompt + template instructions) get cached input pricing — typically 50-90% cheaper.
| Provider | Standard input | Cached input | Savings |
|---|---|---|---|
| OpenAI (GPT-5.4) | $2.50/M | $0.25/M | 90% |
| Anthropic (Claude Opus 4.6) | $5.00/M | $0.50/M | 90% |
| Google (Gemini 3.1 Pro) | $2.00/M | $0.50/M | 75% |
Real impact on the law firm scenario: If your 200 contracts/month use similar templates with a 2,000-token system prompt and extraction instructions, caching that prefix saves:
- Claude Opus 4.6: From $21.00/month → ~$17.10/month (cached prefix on every call)
- GPT-5.4: From $11.10/month → ~$6.20/month
The savings scale with how much of your prompt is reusable. For standardized document processing pipelines, 60-70% of the input is often cacheable (system prompt + format instructions + few-shot examples).
💡 Key Takeaway: If you're processing similar document types repeatedly, prompt caching is non-negotiable. It can cut your summarization costs by 30-50% with zero quality impact.
Batch processing: another 50% off
OpenAI's Batch API processes requests asynchronously at 50% off standard pricing. Turnaround is within 24 hours. For summarization workloads that aren't time-sensitive — nightly report processing, weekly digest generation, archival summarization — this is free money.
| Model | Standard cost/M input | Batch cost/M input | Savings |
|---|---|---|---|
| GPT-5.4 | $2.50 | $1.25 | 50% |
| GPT-5.4 mini | $0.75 | $0.375 | 50% |
| GPT-5.4 nano | $0.20 | $0.10 | 50% |
Combining batch + caching on GPT-5.4: Your effective input rate drops from $2.50 to as low as $0.125/M tokens — that's cheaper than Gemini 2.0 Flash's standard rate.
For the book summarization SaaS, switching to GPT-5.4 batch pricing brings monthly costs from $382.50 down to $191.25 — and if you add caching for repeated system prompts, you're looking at roughly $120-150/month for 1,000 books on a flagship model. That changes the economics entirely.
📊 Quick Math: GPT-5.4 with batch + caching costs roughly $0.13/book — only 3.4× more expensive than Gemini 2.0 Flash at standard rates, while delivering significantly better summary quality.
Quality vs cost: when to use flagships for summarization
Budget models handle straightforward summarization well. A Gemini 2.0 Flash summary of a news article is perfectly fine. But there are specific summarization tasks where flagships earn their premium:
Use flagship models (GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro) for:
- Legal documents where missing a clause has real consequences
- Financial reports where numerical accuracy drives decisions
- Technical documents with domain-specific terminology
- Multi-document synthesis (combining insights across several sources)
- Documents requiring structured extraction (tables, key-value pairs, specific fields)
Use budget models (Gemini 2.0 Flash, DeepSeek V3.2, GPT-5.4 nano) for:
- News article summaries and press releases
- Meeting transcript digests
- Email thread summarization
- Content curation and abstract generation
- Any task where a 90% accurate summary is good enough
The hybrid approach is what most production systems use: route documents through a classifier that estimates complexity, then send simple docs to budget models and complex ones to flagships. This typically delivers 80% of flagship quality at 20% of flagship cost.
For more on model routing strategies, check our model routing guide.
Building a document summarization pipeline
Here's a practical architecture for a cost-optimized summarization system:
Step 1: Document ingestion
Parse the document into clean text. PDF extraction tools (like pdf-parse or PyMuPDF) add minimal cost — they're local processing, not API calls.
Step 2: Token counting Count tokens before sending. If the document exceeds your model's context window, switch to chunk-and-merge.
Step 3: Complexity routing Run a cheap classifier (GPT-5.4 nano at $0.20/M input) on the first 500 tokens to estimate document complexity. Route accordingly:
- Simple → Gemini 2.0 Flash or DeepSeek V3.2
- Medium → GPT-5.4 mini or Gemini 2.5 Flash
- Complex → GPT-5.4 or Claude Sonnet 4.6
Step 4: Summarize with caching Use a cached system prompt with your output format instructions. This stays constant across all documents of the same type.
Step 5: Quality check (optional) For high-stakes documents, run a quick validation: does the summary mention key entities from the original? A cheap embedding similarity check catches most hallucinations.
✅ TL;DR: A well-built summarization pipeline uses token counting, complexity routing, prompt caching, and batch processing to deliver flagship-quality summaries at budget-model prices. The total implementation effort is 2-3 days of engineering work.
Cost comparison: AI summarization vs human summarization
The cost advantage of AI summarization is staggering when you run the numbers:
| Method | Cost per 30-page document | Time | Monthly cost (200 docs) |
|---|---|---|---|
| Human analyst | $50-150 | 2-4 hours | $10,000-30,000 |
| GPT-5.4 | $0.056 | ~10 seconds | $11.10 |
| Claude Opus 4.6 | $0.105 | ~15 seconds | $21.00 |
| DeepSeek V3.2 | $0.005 | ~8 seconds | $0.94 |
Even the most expensive AI model is 1,400× cheaper than a human analyst. The question isn't whether to use AI for summarization — it's which AI to use.
[stat] 1,400× Cost advantage of AI document summarization vs human analysts — even using the most expensive model (Claude Opus 4.6)
Of course, AI summaries aren't perfect. They can miss nuance, hallucinate details, or misinterpret ambiguous language. For critical documents, the winning formula is AI-generated first draft + human review — which still cuts costs by 80-90% versus fully manual summarization.
Frequently asked questions
How much does it cost to summarize a 10-page report with AI?
A 10-page report (~5,000 input tokens, ~500 output tokens) costs between $0.0005 (DeepSeek V3.2) and $0.038 (Claude Opus 4.6). For most business reports, budget models like Gemini 2.0 Flash or DeepSeek deliver perfectly adequate summaries at under $0.001 per document. Use the AI Cost Calculator to model your exact workload.
Which AI model gives the best summarization quality?
Claude Opus 4.6 and GPT-5.4 consistently produce the most nuanced, accurate summaries — especially for legal, financial, and technical documents. For straightforward content (news, meeting notes, general reports), Gemini 2.5 Flash and DeepSeek V3.2 deliver 90%+ of the quality at a fraction of the price. Quality differences are most noticeable on documents with complex cross-references or domain-specific terminology.
Is it cheaper to summarize documents in batch or real-time?
Batch processing is significantly cheaper. OpenAI's Batch API gives 50% off standard rates with 24-hour turnaround. Combined with prompt caching, you can get GPT-5.4 summarization at $0.125/M input tokens — cheaper than most budget models at standard rates. If your summarization workflow doesn't need instant results, batch is the obvious choice.
How do I handle documents longer than the context window?
Use a chunk-and-merge strategy: split the document into overlapping segments (with ~500 token overlap to preserve context), summarize each chunk, then pass all chunk summaries into a final consolidation call. The overhead is typically one extra API call. For documents under 250 pages, most modern models (128K+ context) handle the entire document in a single pass. For longer documents, see our large context window cost guide.
Can AI summarize tables and charts in documents?
Models with vision capabilities (GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro) can process document images including tables and charts. However, for structured data extraction, converting tables to text or JSON before sending to the API is more token-efficient and produces more accurate results. Vision-based processing adds multimodal pricing on top of text costs.
The bottom line
Document summarization is one of AI's most cost-effective use cases. Even at enterprise scale (thousands of documents per month), your total API spend stays in the low hundreds of dollars on budget models — or low thousands on flagships. The real cost optimization isn't picking the cheapest model; it's building the right pipeline with caching, batching, and routing.
Our recommendation: Start with Gemini 2.0 Flash or DeepSeek V3.2 for general summarization. Route complex or high-stakes documents to GPT-5.4 or Claude Sonnet 4.6. Enable prompt caching on day one. Add batch processing for any workflow that doesn't need real-time results.
Use the AI Cost Calculator to model your exact summarization workload, or check our cost optimization strategies guide for more ways to cut your API spend.
