Read time

10 min

Sections

Focus

legal-discovery

AI legal discovery is one of the cleanest use cases for model routing because most documents do not deserve premium-model attention. A litigation team may need to triage 100,000 emails, PDFs, chat exports, contracts, spreadsheets, and attachments, but only a fraction will become privilege calls, chronology entries, deposition exhibits, or partner-level escalation packets.

The cost mistake is simple: sending every document to a premium reasoning model. That makes legal AI feel expensive before the workflow has earned the spend. The correct 2026 setup is a staged pipeline: cheap models for first-pass classification, balanced models for privilege and issue spotting, and premium models only for the small set of documents that affect strategy.

This guide breaks down the API cost of AI legal discovery by cost per document, cost per 100,000 files, and monthly litigation workload. The numbers use real model prices from AI Cost Check’s model database, including GPT-5 nano, GPT-5 mini, Claude Sonnet 4.6, Gemini 2.0 Flash-Lite, and DeepSeek V4 Flash.

⚠️ Warning: These are API token costs, not total e-discovery platform costs. OCR, hosting, review software, storage, security review, human attorney review, and vendor margins can cost more than the model calls. But API cost still matters because poor routing can multiply the AI layer by 10x to 90x.

The baseline: what one legal discovery document costs

For discovery triage, use this baseline document profile:

2,500 input tokens per document
150 output tokens for classification, issue tags, short rationale, and next action
One model call per document
No OCR cost included
No embeddings included

That is enough for first-pass review of emails, ordinary PDFs, short contracts, Slack exports, support tickets, and extracted text from scanned exhibits. Longer contracts and expert reports need chunking, but most large discovery sets contain enough short communications that 2,500 input / 150 output tokens is a practical planning average.

Model	Input / 1M	Output / 1M	Cost per document	Cost per 100,000 docs	Best role
GPT-5 nano	$0.05	$0.40	$0.000185	$18.50	Cheapest first-pass triage
Gemini 2.0 Flash-Lite	$0.075	$0.30	$0.000233	$23.25	Fast cheap triage
Mistral Small 3.2	$0.10	$0.30	$0.000295	$29.50	Low-cost classification
DeepSeek V4 Flash	$0.14	$0.28	$0.000392	$39.20	Cheap bulk review
GPT-5 mini	$0.25	$2.00	$0.000925	$92.50	Balanced second pass
Gemini 2.5 Flash	$0.30	$2.50	$0.001125	$112.50	Balanced fast review
GPT-5	$1.25	$10.00	$0.004625	$462.50	High-quality analysis
Claude Haiku 4.5	$1.00	$5.00	$0.003250	$325.00	Careful mid-tier review
Claude Sonnet 4.6	$3.00	$15.00	$0.009750	$975.00	Premium legal reasoning
GPT-5.5	$5.00	$30.00	$0.017000	$1,700.00	Highest-value escalation

The cheapest acceptable first-pass option is GPT-5 nano at $18.50 per 100,000 documents for this triage profile. If the matter needs a larger context window, Gemini 2.0 Flash-Lite costs $23.25 per 100,000 documents and supports a 1,000,000-token context window.

[stat] $18.50 The API cost to triage 100,000 discovery documents with GPT-5 nano at 2,500 input tokens and 150 output tokens per file

The important number is not the cost of one document. It is the multiplier across review volume. At 1,000,000 documents, GPT-5 nano triage costs $185, while GPT-5.5 triage costs $17,000 using the same token profile. That gap is too large to ignore.

The recommended legal discovery routing stack

A litigation team should not use one model for everything. Use four lanes:

Bulk triage
Privilege and issue spotting
Chronology and fact summaries
Deposition and escalation packets

The best default stack for 2026 is:

Workflow stage	Recommended model	Why	Typical share of corpus
Bulk document triage	GPT-5 nano or Gemini 2.0 Flash-Lite	Cheapest tagging and routing	100%
Privilege spotting	GPT-5 mini	Better judgment at still-low cost	10-20%
Chronology summaries	Claude Sonnet 4.6	Strong narrative reasoning	1-5%
Deposition prep	Claude Sonnet 4.6 or GPT-5.5	High-value synthesis	Witness-level packets
Final escalation	GPT-5.5	Use only for decisive strategy memos	Less than 1%

💡 Key Takeaway: Use the cheapest reliable model to say “this document matters” and reserve premium models for “why this document changes the case.”

For most litigation teams, GPT-5 mini is the balanced workhorse. It costs $0.25 per 1M input tokens and $2 per 1M output tokens, which keeps second-pass review affordable. Premium models like Claude Sonnet 4.6 and GPT-5.5 should be routed only to documents already flagged as important.

$18.50

GPT-5 nano triage per 100,000 docs

$1,700

GPT-5.5 triage per 100,000 docs

The correct recommendation is direct: do not run premium models across the whole corpus. Use premium models after filtering.

Cost per legal discovery task

Legal discovery is not one task. A document can pass through several stages. Each stage has different token intensity.

1. Document triage

Triage asks: what is this document, who is involved, what issues appear, and should a reviewer look at it?

Assumption:

2,500 input tokens
150 output tokens

Model	Cost per doc	Cost per 100,000 docs
GPT-5 nano	$0.000185	$18.50
Gemini 2.0 Flash-Lite	$0.000233	$23.25
DeepSeek V4 Flash	$0.000392	$39.20
GPT-5 mini	$0.000925	$92.50
Claude Sonnet 4.6	$0.009750	$975.00
GPT-5.5	$0.017000	$1,700.00

For pure triage, use GPT-5 nano first. If context length matters more than absolute minimum cost, use Gemini 2.0 Flash-Lite.

2. Privilege spotting

Privilege spotting needs more careful output because the model must identify attorney-client language, work-product indicators, legal advice requests, participants, and reasons for escalation.

Assumption:

3,200 input tokens
250 output tokens

Model	Cost per reviewed doc	Cost per 100,000 reviewed docs
GPT-5 mini	$0.001300	$130.00
Gemini 2.5 Flash	$0.001585	$158.50
Claude Haiku 4.5	$0.004450	$445.00
Claude Sonnet 4.6	$0.013350	$1,335.00

Use GPT-5 mini for first privilege screening. Escalate uncertain calls to Claude Sonnet 4.6. Do not ask a cheap model to make final privilege decisions without human review.

⚠️ Warning: AI privilege spotting should produce review queues, not final waiver decisions. The cost of one mistaken production can exceed the entire API budget for the matter.

3. Chronology summaries

Chronology summaries need longer output because the model must extract dates, actors, events, source citations, contradictions, and downstream relevance.

Assumption:

4,500 input tokens
800 output tokens

Model	Cost per summary	Cost for 10,000 summaries
GPT-5 mini	$0.002725	$27.25
Claude Haiku 4.5	$0.008500	$85.00
Claude Sonnet 4.6	$0.025500	$255.00
GPT-5.5	$0.046500	$465.00

Use Claude Sonnet 4.6 for chronology summaries when the result goes into a case memo, deposition outline, or partner review. Use GPT-5 mini when the summary is just a temporary review aid.

4. Deposition prep packets

Deposition prep is not document-by-document. It is witness-level synthesis. A packet may include email chains, contracts, chronology notes, hot documents, prior testimony, admissions, and contradictions.

Assumption per witness:

250,000 input tokens
8,000 output tokens

Model	Cost per witness packet	Cost for 50 witnesses
GPT-5 mini	$0.079	$3.93
GPT-5	$0.393	$19.63
Claude Sonnet 4.6	$0.870	$43.50
GPT-5.5	$1.025	$51.25

For deposition prep, the API cost is usually tiny compared with attorney time. Use Claude Sonnet 4.6 or GPT-5.5 for witness packets. The extra $40-$50 across 50 witnesses is not where a litigation team should economize.

Practical scenario 1: small firm, 25,000 documents

A small commercial dispute has 25,000 documents, five key custodians, and a tight review budget. The goal is first-pass triage, privilege queue creation, and a short chronology for mediation.

Recommended stack:

25,000 docs through GPT-5 nano triage
2,500 flagged docs through GPT-5 mini privilege spotting
500 hot docs summarized with Claude Sonnet 4.6
5 witness packets with Claude Sonnet 4.6
Add 25% overhead for retries, long docs, and reruns

Cost:

Stage	Volume	Model	Estimated cost
Triage	25,000 docs	GPT-5 nano	$4.63
Privilege spotting	2,500 docs	GPT-5 mini	$3.25
Chronology summaries	500 docs	Claude Sonnet 4.6	$12.75
Deposition prep	5 witnesses	Claude Sonnet 4.6	$4.35
Subtotal			$24.98
25% overhead			$6.25
Total			$31.23

This is the best default setup for a small firm: cheap routing first, premium model only for the useful documents.

📊 Quick Math: A 25,000-document matter can run a practical AI discovery workflow for about $31 in API calls when premium models are limited to summaries and witness packets.

Practical scenario 2: mid-size litigation, 100,000 documents

A mid-size litigation team has 100,000 documents, a privilege concern, and 20 likely deposition witnesses. This is the workload where routing starts to matter.

Recommended stack:

100,000 docs through GPT-5 nano triage
15,000 flagged docs through GPT-5 mini privilege spotting
2,000 hot documents summarized with Claude Sonnet 4.6
20 witness packets with Claude Sonnet 4.6
Add 20% overhead

Cost:

Stage	Volume	Model	Estimated cost
Triage	100,000 docs	GPT-5 nano	$18.50
Privilege spotting	15,000 docs	GPT-5 mini	$19.50
Chronology summaries	2,000 docs	Claude Sonnet 4.6	$51.00
Deposition prep	20 witnesses	Claude Sonnet 4.6	$17.40
Subtotal			$106.40
20% overhead			$21.28
Total			$127.68

A single-model premium approach would cost far more. Running the same 100,000-document triage directly through GPT-5.5 costs $1,700 before privilege review, summaries, or deposition prep. Routing keeps the core workflow near $128.

Compare this routing setup with premium-only planning using AI Cost Check, or review the model-level spread in GPT-5 vs DeepSeek V3.2.

Practical scenario 3: enterprise review, 1,000,000 documents

An enterprise litigation or regulatory investigation has 1,000,000 documents, 30 custodians, multiple privilege tracks, and 100 potential witnesses. The mistake here is letting every document hit a premium model. The right move is aggressive filtering.

Recommended stack:

1,000,000 docs through GPT-5 nano triage
150,000 flagged docs through GPT-5 mini privilege spotting
20,000 hot documents summarized with Claude Sonnet 4.6
100 witness packets with Claude Sonnet 4.6
Add 30% overhead

Cost:

Stage	Volume	Model	Estimated cost
Triage	1,000,000 docs	GPT-5 nano	$185.00
Privilege spotting	150,000 docs	GPT-5 mini	$195.00
Chronology summaries	20,000 docs	Claude Sonnet 4.6	$510.00
Deposition prep	100 witnesses	Claude Sonnet 4.6	$87.00
Subtotal			$977.00
30% overhead			$293.10
Total			$1,270.10

Now compare that with premium all-doc triage:

Strategy	Approximate API cost
Routed enterprise workflow	$1,270
Claude Sonnet 4.6 triage on all 1M docs only	$9,750
GPT-5.5 triage on all 1M docs only	$17,000

The routed workflow includes triage, privilege spotting, summaries, and deposition prep. The premium-only lines cover just triage. That is why routing is not a minor optimization; it is the cost model.

✅ TL;DR: For 1,000,000 documents, route first and escalate later. The practical routed workflow is about $1,270, while premium-model triage alone can hit $9,750-$17,000.

When to use cheap, balanced, and premium models

Use cheap models for documents where the output is a label, not a legal conclusion. GPT-5 nano, Gemini 2.0 Flash-Lite, Mistral Small 3.2, and DeepSeek V4 Flash are strong fits for bulk classification, duplicate group tagging, custodian clustering, “hot/not hot” routing, issue labels, and low-risk summary snippets.

Use balanced models when the model must explain itself. GPT-5 mini and Gemini 2.5 Flash are good for privilege queues, issue spotting, timeline extraction, and “why this matters” rationales. Balanced models should handle most second-pass discovery work because their cost stays low while output quality improves.

Use premium models when a lawyer will rely on the output to make a tactical decision. Claude Sonnet 4.6, GPT-5, and GPT-5.5 are the right layer for deposition prep, adverse fact summaries, contradiction analysis, mediation packets, and senior-attorney escalation memos.

The clean recommendation:

Bulk triage: GPT-5 nano
Cheap long-context alternative: Gemini 2.0 Flash-Lite
Privilege queue: GPT-5 mini
Chronology summaries: Claude Sonnet 4.6
Deposition prep: Claude Sonnet 4.6 or GPT-5.5
Final strategy memo: GPT-5.5 only after filtering

Use AI Cost Check to test your own document count, average document length, and routing percentages before committing to a vendor workflow.

Frequently asked questions

How much does AI legal discovery cost per document?

For first-pass triage, AI legal discovery can cost as little as $0.000185 per document with GPT-5 nano using a 2,500-input-token and 150-output-token profile. A premium GPT-5.5 pass on the same document costs about $0.017, which is roughly 92x higher.

How much does it cost to review 100,000 documents with AI?

A routed 100,000-document workflow costs about $128 in API calls when using GPT-5 nano for triage, GPT-5 mini for privilege spotting, and Claude Sonnet 4.6 for summaries and deposition prep. A premium-only GPT-5.5 triage pass costs $1,700 before any second-pass work.

Which AI model is cheapest for legal discovery?

GPT-5 nano is the cheapest first-pass model in this guide at $18.50 per 100,000 documents for the baseline triage profile. Gemini 2.0 Flash-Lite is the best cheap alternative when a larger context window is useful.

Should litigation teams use premium models for every document?

No. Premium models should be reserved for hot documents, privilege escalation, chronology summaries, deposition prep, and strategic memos. Running Claude Sonnet 4.6 or GPT-5.5 over an entire corpus wastes budget because most documents only need classification and routing.

What is the best AI model stack for legal discovery in 2026?

Use GPT-5 nano for bulk triage, GPT-5 mini for privilege and issue spotting, Claude Sonnet 4.6 for chronology summaries and deposition prep, and GPT-5.5 for final escalation memos. This stack keeps API cost low while preserving premium reasoning for documents that matter.

Calculate your own legal discovery AI costs

The safest budget method is to model three numbers: documents, average input tokens per document, and percentage escalated to each review stage. Once those are clear, model choice becomes straightforward.

Start with these defaults:

2,500 input / 150 output tokens for triage
3,200 input / 250 output tokens for privilege spotting
4,500 input / 800 output tokens for chronology summaries
250,000 input / 8,000 output tokens for deposition prep packets
20-30% overhead for retries, long documents, and reruns

Run the numbers in AI Cost Check, compare model pages like GPT-5 mini and Claude Sonnet 4.6, then pressure-test the routing spread with GPT-5 vs Claude Sonnet 4.5 or Claude Opus 4.6 vs DeepSeek V3.2.

For litigation teams, the winning 2026 pattern is clear: cheap model first, balanced model second, premium model last.

Related Cost Guides

Keep going with the closest pricing and optimization guides in this cluster.

AI Legal Discovery Costs in 2026: Cost Per Document, Per 100,000 Files, and the Cheapest Models for Litigation Teams

The baseline: what one legal discovery document costs

The recommended legal discovery routing stack

Cost per legal discovery task

1. Document triage

2. Privilege spotting

3. Chronology summaries

4. Deposition prep packets

Practical scenario 1: small firm, 25,000 documents

Practical scenario 2: mid-size litigation, 100,000 documents

Practical scenario 3: enterprise review, 1,000,000 documents

When to use cheap, balanced, and premium models

Frequently asked questions

How much does AI legal discovery cost per document?

How much does it cost to review 100,000 documents with AI?

Which AI model is cheapest for legal discovery?

Should litigation teams use premium models for every document?

What is the best AI model stack for legal discovery in 2026?

Calculate your own legal discovery AI costs

Related Cost Guides

What Claude Fable 5 Makes Possible: 7 Agentic Workflows You Can Build Now

Claude Sonnet 4.6 Pricing Guide 2026: Cost Per Million Tokens, 1M Context Math, and When It Beats GPT-5.2 or Gemini

AI Structured Output Costs in 2026: JSON Mode, Tool Calling, and What Validation Retries Really Cost