DeepSeek finally cleaned up its lineup.
Before V4, DeepSeek’s value case was simple but slightly awkward. DeepSeek V3.2 was cheap, but the 128K context ceiling meant you still had to think twice before making it your default for long RAG prompts, big coding diffs, or large document workflows. DeepSeek V4 Flash and DeepSeek V4 Pro fix that problem fast: both move to 1M context, and Flash is not just more capable on paper — it is also materially cheaper than V3.2.
That changes the buying decision. The real question is no longer “is DeepSeek the bargain option?” It is “which DeepSeek lane should own which workload?” Flash is now the obvious high-volume default. Pro is the interesting middle tier for teams that want more headroom without paying Claude Sonnet 4.6 money. And V3.2 suddenly looks like the model you keep only when you already tuned around it.
This guide breaks down the real 2026 token math using current AI Cost Check pricing. I will show what V4 Flash and Pro cost, how they compare with V3.2, GPT-5 mini, Gemini 2.5 Flash, and Sonnet, and the routing stack I would actually ship.
💡 Key Takeaway: DeepSeek V4 Flash is the new default value pick. DeepSeek V4 Pro only makes sense as an escalation lane, not as your blanket default.
DeepSeek V4 pricing at a glance
Here is the part that matters.
| Model | Input per 1M tokens | Output per 1M tokens | Context window | What it really is |
|---|---|---|---|---|
| DeepSeek V4 Flash | $0.14 | $0.28 | 1,000,000 | High-volume default |
| DeepSeek V4 Pro | $0.435 | $0.87 | 1,000,000 | Higher-quality middle tier |
| DeepSeek V3.2 | $0.28 | $0.42 | 128,000 | Last-gen budget tier |
| DeepSeek R1 V3.2 | $0.28 | $0.42 | 128,000 | Last-gen reasoning-flavored tier |
The biggest surprise is Flash. It is 50% cheaper on input than V3.2, 33% cheaper on output, and gives you almost 8x the context window. That is not a gentle refresh. That is a cleaner, cheaper replacement for a huge chunk of V3.2 use.
Pro is different. It is not cheaper than V3.2. It is the “pay a bit more, but not premium-model more” option. On raw price, V4 Pro is 55% higher on input and a little more than 2x higher on output than V3.2. That sounds expensive until you compare it with mid-tier and premium competition.
That single comparison explains why Flash is so attractive for verbose workloads. If the model tends to answer in long blocks, write code, or synthesize large outputs, output pricing becomes the whole game.
📊 Quick Math: A workflow that spends 100M input tokens and 20M output tokens in a month costs $19.60 on V4 Flash, $60.90 on V4 Pro, $36.80 on V3.2, and $65.00 on GPT-5 mini.
What actually changed from V3.2 to V4
DeepSeek V4 is not just “newer DeepSeek.” The economics changed in specific, useful ways.
Flash is the real replacement for V3.2
If you were using V3.2 as a cheap general-purpose default, V4 Flash is the obvious successor. The math is better almost everywhere.
- Input is cut in half: $0.28 → $0.14
- Output is cut by a third: $0.42 → $0.28
- Context jumps from 128K to 1M
That last one matters more than people admit. A model can be cheap and still annoying if it forces you to chunk every large doc, trim long retrieved context, or split large coding tasks into awkward reruns. V4 Flash reduces that friction without asking you to pay more for it.
Pro is the “serious but still rational” lane
V4 Pro is not trying to be the cheapest thing on the page. It is trying to be the tier you pick when Flash is a little too thin but Sonnet is obviously overkill.
At $0.435 input and $0.87 output, Pro is still dramatically cheaper than premium defaults. It is also much more interesting on output-heavy work than GPT-5 mini, because GPT-5 mini’s $2.00 output price is more than double Pro’s.
DeepSeek now has a cleaner split
This is the part I like most.
- Flash = default volume lane
- Pro = better synthesis, coding, and harder escalations
- V3.2 / R1 V3.2 = legacy lane unless you already have reason to stay
That is a much saner pricing story than the old “everything is cheap, but context is tighter and the quality ladder is blurry” setup.
⚠️ Warning: Do not keep V3.2 as your default just because it is familiar. If your workflows ever push past 128K context or produce long outputs, V4 Flash is the cleaner economic choice.
What DeepSeek V4 costs on real workloads
Token prices are nice. Real task costs are better. I am using three practical workloads here:
- Support / RAG answer — 20,000 input tokens and 2,000 output tokens
- Coding agent task — 25,000 input tokens and 8,000 output tokens
- Long-context document pass — 150,000 input tokens and 3,000 output tokens
These are not edge-case numbers. They are normal once you include system prompts, retrieved context, tool instructions, or large files.
1) Support and RAG answer costs
This is the everyday knowledge-bot lane.
| Model | Cost per answer | Cost per 100,000 answers |
|---|---|---|
| DeepSeek V4 Flash | $0.00336 | $336 |
| DeepSeek V4 Pro | $0.01044 | $1,044 |
| DeepSeek V3.2 | $0.00644 | $644 |
| GPT-5 mini | $0.00900 | $900 |
| Gemini 2.5 Flash | $0.01100 | $1,100 |
| Claude Sonnet 4.6 | $0.09000 | $9,000 |
This table is why Flash matters. It is not just cheaper than GPT-5 mini or Gemini 2.5 Flash. It is cheaper than DeepSeek’s own previous generation while also being much easier to use for big retrieved prompts.
[stat] $8,664/month Savings from using DeepSeek V4 Flash instead of Claude Sonnet 4.6 for 100,000 RAG answers
That is not a rounding error. That is a budget line.
2) Coding agent task costs
Now let’s look at a more output-heavy workload. Think code explanation, patch drafting, or structured agent output.
| Model | Cost per task | Cost per 10,000 tasks |
|---|---|---|
| DeepSeek V4 Flash | $0.00574 | $57.40 |
| DeepSeek V4 Pro | $0.01784 | $178.35 |
| DeepSeek V3.2 | $0.01036 | $103.60 |
| GPT-5 mini | $0.02225 | $222.50 |
| Gemini 2.5 Flash | $0.02750 | $275.00 |
| Mistral Medium 3 | $0.02600 | $260.00 |
| Claude Sonnet 4.6 | $0.19500 | $1,950.00 |
This is the interesting split.
- Flash is the blunt economic winner.
- Pro is still cheaper than GPT-5 mini and Mistral Medium 3 on the same task.
- Sonnet is in a totally different price class.
If your coding workflow is output-heavy, Pro starts to make more sense than its raw input price suggests. Yes, GPT-5 mini has cheaper input. But output is where a lot of coding-agent spend lives, and Pro’s $0.87 output rate is less than half of GPT-5 mini’s $2.00.
✅ TL;DR: For coding agents, Flash is the volume play. Pro is the “I want better output without jumping to Sonnet” play. GPT-5 mini is still good, but it is no longer the obvious value king.
3) Long-context document pass costs
This is where the 1M context jump stops being theoretical. A 150K-token prompt does not even fit cleanly on V3.2 or R1 V3.2.
| Model | Context window | Raw token cost | Practical note |
|---|---|---|---|
| DeepSeek V4 Flash | 1,000,000 | $0.02184 | Fits cleanly |
| DeepSeek V4 Pro | 1,000,000 | $0.06786 | Fits cleanly |
| DeepSeek V3.2 | 128,000 | $0.04326 | Does not fit cleanly |
| GPT-5 mini | 500,000 | $0.04350 | Fits cleanly |
| Gemini 2.5 Flash | 1,000,000 | $0.05250 | Fits cleanly |
| Claude Sonnet 4.6 | 1,000,000 | $0.49500 | Fits cleanly |
This is my favorite V4 Flash table because it exposes the old V3.2 compromise.
On raw token math, V3.2 and GPT-5 mini look close. In practice, V3.2 does not cleanly fit the job. That means chunking, multi-pass workflows, or truncation. Suddenly the “cheap” option is not actually simpler, and often not actually cheaper once you account for retries and orchestration.
If you care about big document prompts, pair this with Large Context Window Costs in 2026. The short version is simple: context window is part of the price.
Flash vs Pro: which one should own the workload?
This is the decision point most teams actually need.
Use Flash as the default when volume wins
Flash should own:
- support automation
- grounded RAG answers
- routing and classification
- PR summaries and lightweight code tasks
- bulk document passes where the model mostly extracts or organizes
Why? Because it is cheap enough to run constantly, the 1M context window removes most old DeepSeek awkwardness, and the output price is low enough that verbose tasks do not quietly explode your bill.
In plain English: Flash is the model you use when you want to stop thinking about cost every time the product scales.
Use Pro when output quality matters more than raw volume
Pro should own:
- harder coding tasks
- multi-step synthesis
- better final-form writing
- research or analysis where the answer quality matters more than the input bill
- escalations from Flash when the cheap lane looks shaky
The key here is that Pro is not a premium flagship. It is a middle tier. That makes it much more defensible as an escalation lane than Sonnet or Opus.
Do not use Pro as the lazy default
This is the trap.
V4 Pro is economically attractive because it is selective. If you start sending all routine traffic to Pro, you give up most of the reason Flash exists. The best DeepSeek setup is not picking one winner. It is using both lanes properly.
💡 Key Takeaway: Flash should do the boring work by default. Pro should only touch the cases where better synthesis or stronger coding output actually matters.
How DeepSeek V4 compares with GPT-5 mini, Gemini 2.5 Flash, and Sonnet
DeepSeek V4 is strongest when you compare it to the actual models teams default to.
| Model | Input per 1M | Output per 1M | Context | Best use |
|---|---|---|---|---|
| DeepSeek V4 Flash | $0.14 | $0.28 | 1M | High-volume default |
| DeepSeek V4 Pro | $0.435 | $0.87 | 1M | Mid-tier escalation |
| GPT-5 mini | $0.25 | $2.00 | 500K | General balanced default |
| Gemini 2.5 Flash | $0.30 | $2.50 | 1M | Long-context flash tier |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 1M | Premium quality default |
A few blunt conclusions:
Versus GPT-5 mini
GPT-5 mini still has a real place. It is popular because it is balanced, reliable, and much cheaper than premium OpenAI tiers. But on raw token economics, DeepSeek V4 changed the value equation.
- Flash is cheaper on both input and output.
- Pro is more expensive on input, but dramatically cheaper on output.
- Flash and Pro both beat GPT-5 mini on context window.
If your workload is output-heavy, DeepSeek looks better than GPT-5 mini faster than many teams expect.
Versus Gemini 2.5 Flash
Gemini 2.5 Flash has the same 1M-context comfort, but the price gap is ugly.
Flash is cheaper by about 53% on input and nearly 89% on output. That makes DeepSeek V4 Flash the much stronger budget choice for long-context commodity work.
Versus Claude Sonnet 4.6
This is the easiest comparison. Sonnet is still the expensive adult in the room.
- Flash is about 95% cheaper on input and 98% cheaper on output.
- Pro is about 85% cheaper on input and 94% cheaper on output.
Sonnet can still win on quality if your evals say it does. But economically, it has to win by a lot.
The routing stack I would actually ship
If I were building around DeepSeek today, I would not pick one model and call it done.
I would ship this:
Lane 1: Flash for intake and volume
Use DeepSeek V4 Flash for:
- retrieval answers
- summarization
- triage
- first-pass coding tasks
- structured extraction
- large doc passes that mostly organize or identify
Lane 2: Pro for real work that writes back
Use DeepSeek V4 Pro for:
- longer code generation
- better final synthesis
- richer analyst-style answers
- escalation from Flash when the first pass is shaky
Lane 3: premium only when the stakes justify it
Only bring in Claude Sonnet 4.6 or another premium model when the task is obviously high stakes: legal review, critical customer responses, major architecture reasoning, or cases where your evaluation set keeps showing real failures on the cheaper lanes.
Here is a realistic mixed monthly workload:
- 80,000 support / RAG answers
- 10,000 coding agent tasks
- 2,000 long-context document passes
If you route support entirely to Flash, route 70% of coding tasks to Flash and 30% to Pro, and split long-doc passes 50/50 between Flash and Pro, the monthly spend lands around $452.
If you run that whole workload on Pro, it is about $1,149. If you run it all on Sonnet, it is about $10,140.
That is the real DeepSeek V4 story. Not “it is cheap.” Plenty of things are cheap. The interesting part is that the lineup is now structured well enough that you can route by task difficulty without leaving the DeepSeek family too early.
Frequently asked questions
What is DeepSeek V4 pricing in 2026?
DeepSeek V4 Flash is $0.14 input and $0.28 output per 1 million tokens. DeepSeek V4 Pro is $0.435 input and $0.87 output per 1 million tokens. Both offer a 1M token context window in the current AI Cost Check data.
Is DeepSeek V4 Flash cheaper than GPT-5 mini?
Yes. Flash is cheaper on both token types. It is $0.14/$0.28 versus GPT-5 mini at $0.25/$2.00. Flash is especially stronger on output-heavy work because the output price gap is so large.
When should I use DeepSeek V4 Pro instead of Flash?
Use Pro when the task needs better synthesis, better long-form output, or stronger coding quality than the Flash lane gives you. Do not use Pro for routine routing, support triage, or boring extraction. That is exactly what Flash is for.
Is DeepSeek V3.2 still worth using?
Only sometimes. If you already tuned a workflow around V3.2 and it never needs more than 128K context, it can still work. But on raw economics, DeepSeek V4 Flash is cheaper and gives you a much larger context window, so it is the better default for new builds.
How much does DeepSeek V4 cost for long-context RAG or document analysis?
A 150,000 input / 3,000 output document pass costs about $0.02184 on Flash and $0.06786 on Pro. That is where V4 becomes especially attractive, because V3.2’s 128K context does not fit the same prompt cleanly.
Calculate your own DeepSeek V4 costs
If you are about to ship on DeepSeek, run your own token counts through AI Cost Check instead of trusting vibe-based pricing takes. Then compare the result against Cheapest AI APIs in 2026, DeepSeek vs GPT-5 mini, and How AI Model Routing Cuts Costs.
My recommendation is simple: start with Flash, promote the hard jobs to Pro, and make premium models prove they deserve the bill.
