Skip to main content

AI Legal Discovery Costs in 2026: Cost Per Document, Per 100,000 Files, and the Cheapest Models for Litigation Teams

Break down AI legal discovery costs per document, per 100,000 files, and by routing stack for litigation teams in 2026.

legal-discoverylitigationcost-analysisdocument-review2026
AI Legal Discovery Costs in 2026: Cost Per Document, Per 100,000 Files, and the Cheapest Models for Litigation Teams

AI legal discovery is one of the cleanest use cases for model routing because most documents do not deserve premium-model attention. A litigation team may need to triage 100,000 emails, PDFs, chat exports, contracts, spreadsheets, and attachments, but only a fraction will become privilege calls, chronology entries, deposition exhibits, or partner-level escalation packets.

The cost mistake is simple: sending every document to a premium reasoning model. That makes legal AI feel expensive before the workflow has earned the spend. The correct 2026 setup is a staged pipeline: cheap models for first-pass classification, balanced models for privilege and issue spotting, and premium models only for the small set of documents that affect strategy.

This guide breaks down the API cost of AI legal discovery by cost per document, cost per 100,000 files, and monthly litigation workload. The numbers use real model prices from AI Cost Check’s model database, including GPT-5 nano, GPT-5 mini, Claude Sonnet 4.6, Gemini 2.0 Flash-Lite, and DeepSeek V4 Flash.

⚠️ Warning: These are API token costs, not total e-discovery platform costs. OCR, hosting, review software, storage, security review, human attorney review, and vendor margins can cost more than the model calls. But API cost still matters because poor routing can multiply the AI layer by 10x to 90x.


The baseline: what one legal discovery document costs

For discovery triage, use this baseline document profile:

  • 2,500 input tokens per document
  • 150 output tokens for classification, issue tags, short rationale, and next action
  • One model call per document
  • No OCR cost included
  • No embeddings included

That is enough for first-pass review of emails, ordinary PDFs, short contracts, Slack exports, support tickets, and extracted text from scanned exhibits. Longer contracts and expert reports need chunking, but most large discovery sets contain enough short communications that 2,500 input / 150 output tokens is a practical planning average.

Model Input / 1M Output / 1M Cost per document Cost per 100,000 docs Best role
GPT-5 nano $0.05 $0.40 $0.000185 $18.50 Cheapest first-pass triage
Gemini 2.0 Flash-Lite $0.075 $0.30 $0.000233 $23.25 Fast cheap triage
Mistral Small 3.2 $0.10 $0.30 $0.000295 $29.50 Low-cost classification
DeepSeek V4 Flash $0.14 $0.28 $0.000392 $39.20 Cheap bulk review
GPT-5 mini $0.25 $2.00 $0.000925 $92.50 Balanced second pass
Gemini 2.5 Flash $0.30 $2.50 $0.001125 $112.50 Balanced fast review
GPT-5 $1.25 $10.00 $0.004625 $462.50 High-quality analysis
Claude Haiku 4.5 $1.00 $5.00 $0.003250 $325.00 Careful mid-tier review
Claude Sonnet 4.6 $3.00 $15.00 $0.009750 $975.00 Premium legal reasoning
GPT-5.5 $5.00 $30.00 $0.017000 $1,700.00 Highest-value escalation

The cheapest acceptable first-pass option is GPT-5 nano at $18.50 per 100,000 documents for this triage profile. If the matter needs a larger context window, Gemini 2.0 Flash-Lite costs $23.25 per 100,000 documents and supports a 1,000,000-token context window.

[stat] $18.50 The API cost to triage 100,000 discovery documents with GPT-5 nano at 2,500 input tokens and 150 output tokens per file

The important number is not the cost of one document. It is the multiplier across review volume. At 1,000,000 documents, GPT-5 nano triage costs $185, while GPT-5.5 triage costs $17,000 using the same token profile. That gap is too large to ignore.


The recommended legal discovery routing stack

A litigation team should not use one model for everything. Use four lanes:

  1. Bulk triage
  2. Privilege and issue spotting
  3. Chronology and fact summaries
  4. Deposition and escalation packets

The best default stack for 2026 is:

Workflow stage Recommended model Why Typical share of corpus
Bulk document triage GPT-5 nano or Gemini 2.0 Flash-Lite Cheapest tagging and routing 100%
Privilege spotting GPT-5 mini Better judgment at still-low cost 10-20%
Chronology summaries Claude Sonnet 4.6 Strong narrative reasoning 1-5%
Deposition prep Claude Sonnet 4.6 or GPT-5.5 High-value synthesis Witness-level packets
Final escalation GPT-5.5 Use only for decisive strategy memos Less than 1%

💡 Key Takeaway: Use the cheapest reliable model to say “this document matters” and reserve premium models for “why this document changes the case.”

For most litigation teams, GPT-5 mini is the balanced workhorse. It costs $0.25 per 1M input tokens and $2 per 1M output tokens, which keeps second-pass review affordable. Premium models like Claude Sonnet 4.6 and GPT-5.5 should be routed only to documents already flagged as important.

$18.50
GPT-5 nano triage per 100,000 docs
vs
$1,700
GPT-5.5 triage per 100,000 docs

The correct recommendation is direct: do not run premium models across the whole corpus. Use premium models after filtering.


Cost per legal discovery task

Legal discovery is not one task. A document can pass through several stages. Each stage has different token intensity.

1. Document triage

Triage asks: what is this document, who is involved, what issues appear, and should a reviewer look at it?

Assumption:

  • 2,500 input tokens
  • 150 output tokens
Model Cost per doc Cost per 100,000 docs
GPT-5 nano $0.000185 $18.50
Gemini 2.0 Flash-Lite $0.000233 $23.25
DeepSeek V4 Flash $0.000392 $39.20
GPT-5 mini $0.000925 $92.50
Claude Sonnet 4.6 $0.009750 $975.00
GPT-5.5 $0.017000 $1,700.00

For pure triage, use GPT-5 nano first. If context length matters more than absolute minimum cost, use Gemini 2.0 Flash-Lite.

2. Privilege spotting

Privilege spotting needs more careful output because the model must identify attorney-client language, work-product indicators, legal advice requests, participants, and reasons for escalation.

Assumption:

  • 3,200 input tokens
  • 250 output tokens
Model Cost per reviewed doc Cost per 100,000 reviewed docs
GPT-5 mini $0.001300 $130.00
Gemini 2.5 Flash $0.001585 $158.50
Claude Haiku 4.5 $0.004450 $445.00
Claude Sonnet 4.6 $0.013350 $1,335.00

Use GPT-5 mini for first privilege screening. Escalate uncertain calls to Claude Sonnet 4.6. Do not ask a cheap model to make final privilege decisions without human review.

⚠️ Warning: AI privilege spotting should produce review queues, not final waiver decisions. The cost of one mistaken production can exceed the entire API budget for the matter.

3. Chronology summaries

Chronology summaries need longer output because the model must extract dates, actors, events, source citations, contradictions, and downstream relevance.

Assumption:

  • 4,500 input tokens
  • 800 output tokens
Model Cost per summary Cost for 10,000 summaries
GPT-5 mini $0.002725 $27.25
Claude Haiku 4.5 $0.008500 $85.00
Claude Sonnet 4.6 $0.025500 $255.00
GPT-5.5 $0.046500 $465.00

Use Claude Sonnet 4.6 for chronology summaries when the result goes into a case memo, deposition outline, or partner review. Use GPT-5 mini when the summary is just a temporary review aid.

4. Deposition prep packets

Deposition prep is not document-by-document. It is witness-level synthesis. A packet may include email chains, contracts, chronology notes, hot documents, prior testimony, admissions, and contradictions.

Assumption per witness:

  • 250,000 input tokens
  • 8,000 output tokens
Model Cost per witness packet Cost for 50 witnesses
GPT-5 mini $0.079 $3.93
GPT-5 $0.393 $19.63
Claude Sonnet 4.6 $0.870 $43.50
GPT-5.5 $1.025 $51.25

For deposition prep, the API cost is usually tiny compared with attorney time. Use Claude Sonnet 4.6 or GPT-5.5 for witness packets. The extra $40-$50 across 50 witnesses is not where a litigation team should economize.


Practical scenario 1: small firm, 25,000 documents

A small commercial dispute has 25,000 documents, five key custodians, and a tight review budget. The goal is first-pass triage, privilege queue creation, and a short chronology for mediation.

Recommended stack:

  • 25,000 docs through GPT-5 nano triage
  • 2,500 flagged docs through GPT-5 mini privilege spotting
  • 500 hot docs summarized with Claude Sonnet 4.6
  • 5 witness packets with Claude Sonnet 4.6
  • Add 25% overhead for retries, long docs, and reruns

Cost:

Stage Volume Model Estimated cost
Triage 25,000 docs GPT-5 nano $4.63
Privilege spotting 2,500 docs GPT-5 mini $3.25
Chronology summaries 500 docs Claude Sonnet 4.6 $12.75
Deposition prep 5 witnesses Claude Sonnet 4.6 $4.35
Subtotal $24.98
25% overhead $6.25
Total $31.23

This is the best default setup for a small firm: cheap routing first, premium model only for the useful documents.

📊 Quick Math: A 25,000-document matter can run a practical AI discovery workflow for about $31 in API calls when premium models are limited to summaries and witness packets.


Practical scenario 2: mid-size litigation, 100,000 documents

A mid-size litigation team has 100,000 documents, a privilege concern, and 20 likely deposition witnesses. This is the workload where routing starts to matter.

Recommended stack:

  • 100,000 docs through GPT-5 nano triage
  • 15,000 flagged docs through GPT-5 mini privilege spotting
  • 2,000 hot documents summarized with Claude Sonnet 4.6
  • 20 witness packets with Claude Sonnet 4.6
  • Add 20% overhead

Cost:

Stage Volume Model Estimated cost
Triage 100,000 docs GPT-5 nano $18.50
Privilege spotting 15,000 docs GPT-5 mini $19.50
Chronology summaries 2,000 docs Claude Sonnet 4.6 $51.00
Deposition prep 20 witnesses Claude Sonnet 4.6 $17.40
Subtotal $106.40
20% overhead $21.28
Total $127.68

A single-model premium approach would cost far more. Running the same 100,000-document triage directly through GPT-5.5 costs $1,700 before privilege review, summaries, or deposition prep. Routing keeps the core workflow near $128.

Compare this routing setup with premium-only planning using AI Cost Check, or review the model-level spread in GPT-5 vs DeepSeek V3.2.


Practical scenario 3: enterprise review, 1,000,000 documents

An enterprise litigation or regulatory investigation has 1,000,000 documents, 30 custodians, multiple privilege tracks, and 100 potential witnesses. The mistake here is letting every document hit a premium model. The right move is aggressive filtering.

Recommended stack:

  • 1,000,000 docs through GPT-5 nano triage
  • 150,000 flagged docs through GPT-5 mini privilege spotting
  • 20,000 hot documents summarized with Claude Sonnet 4.6
  • 100 witness packets with Claude Sonnet 4.6
  • Add 30% overhead

Cost:

Stage Volume Model Estimated cost
Triage 1,000,000 docs GPT-5 nano $185.00
Privilege spotting 150,000 docs GPT-5 mini $195.00
Chronology summaries 20,000 docs Claude Sonnet 4.6 $510.00
Deposition prep 100 witnesses Claude Sonnet 4.6 $87.00
Subtotal $977.00
30% overhead $293.10
Total $1,270.10

Now compare that with premium all-doc triage:

Strategy Approximate API cost
Routed enterprise workflow $1,270
Claude Sonnet 4.6 triage on all 1M docs only $9,750
GPT-5.5 triage on all 1M docs only $17,000

The routed workflow includes triage, privilege spotting, summaries, and deposition prep. The premium-only lines cover just triage. That is why routing is not a minor optimization; it is the cost model.

✅ TL;DR: For 1,000,000 documents, route first and escalate later. The practical routed workflow is about $1,270, while premium-model triage alone can hit $9,750-$17,000.


When to use cheap, balanced, and premium models

Use cheap models for documents where the output is a label, not a legal conclusion. GPT-5 nano, Gemini 2.0 Flash-Lite, Mistral Small 3.2, and DeepSeek V4 Flash are strong fits for bulk classification, duplicate group tagging, custodian clustering, “hot/not hot” routing, issue labels, and low-risk summary snippets.

Use balanced models when the model must explain itself. GPT-5 mini and Gemini 2.5 Flash are good for privilege queues, issue spotting, timeline extraction, and “why this matters” rationales. Balanced models should handle most second-pass discovery work because their cost stays low while output quality improves.

Use premium models when a lawyer will rely on the output to make a tactical decision. Claude Sonnet 4.6, GPT-5, and GPT-5.5 are the right layer for deposition prep, adverse fact summaries, contradiction analysis, mediation packets, and senior-attorney escalation memos.

The clean recommendation:

  • Bulk triage: GPT-5 nano
  • Cheap long-context alternative: Gemini 2.0 Flash-Lite
  • Privilege queue: GPT-5 mini
  • Chronology summaries: Claude Sonnet 4.6
  • Deposition prep: Claude Sonnet 4.6 or GPT-5.5
  • Final strategy memo: GPT-5.5 only after filtering

Use AI Cost Check to test your own document count, average document length, and routing percentages before committing to a vendor workflow.


Frequently asked questions

How much does AI legal discovery cost per document?

For first-pass triage, AI legal discovery can cost as little as $0.000185 per document with GPT-5 nano using a 2,500-input-token and 150-output-token profile. A premium GPT-5.5 pass on the same document costs about $0.017, which is roughly 92x higher.

How much does it cost to review 100,000 documents with AI?

A routed 100,000-document workflow costs about $128 in API calls when using GPT-5 nano for triage, GPT-5 mini for privilege spotting, and Claude Sonnet 4.6 for summaries and deposition prep. A premium-only GPT-5.5 triage pass costs $1,700 before any second-pass work.

Which AI model is cheapest for legal discovery?

GPT-5 nano is the cheapest first-pass model in this guide at $18.50 per 100,000 documents for the baseline triage profile. Gemini 2.0 Flash-Lite is the best cheap alternative when a larger context window is useful.

Should litigation teams use premium models for every document?

No. Premium models should be reserved for hot documents, privilege escalation, chronology summaries, deposition prep, and strategic memos. Running Claude Sonnet 4.6 or GPT-5.5 over an entire corpus wastes budget because most documents only need classification and routing.

What is the best AI model stack for legal discovery in 2026?

Use GPT-5 nano for bulk triage, GPT-5 mini for privilege and issue spotting, Claude Sonnet 4.6 for chronology summaries and deposition prep, and GPT-5.5 for final escalation memos. This stack keeps API cost low while preserving premium reasoning for documents that matter.


Calculate your own legal discovery AI costs

The safest budget method is to model three numbers: documents, average input tokens per document, and percentage escalated to each review stage. Once those are clear, model choice becomes straightforward.

Start with these defaults:

  • 2,500 input / 150 output tokens for triage
  • 3,200 input / 250 output tokens for privilege spotting
  • 4,500 input / 800 output tokens for chronology summaries
  • 250,000 input / 8,000 output tokens for deposition prep packets
  • 20-30% overhead for retries, long documents, and reruns

Run the numbers in AI Cost Check, compare model pages like GPT-5 mini and Claude Sonnet 4.6, then pressure-test the routing spread with GPT-5 vs Claude Sonnet 4.5 or Claude Opus 4.6 vs DeepSeek V3.2.

For litigation teams, the winning 2026 pattern is clear: cheap model first, balanced model second, premium model last.