A dollar buys you 20 million tokens on one model and 13,000 on another. That's a 1,500x difference in raw throughput for the same budget. If you're not checking the tokens-per-dollar math before picking a model, you're probably overspending versus today's cheapest AI APIs.
This guide calculates exactly how many tokens each major AI model gives you for $1 — both input and output — using current 2026 pricing from our broader AI API pricing guide. We'll cover the budget champions, the premium flagships, and the sweet-spot models that balance cost with capability.
✅ TL;DR: GPT-5 Nano gives you 20M input tokens per dollar. GPT-5.2 Pro gives you 47,619. The cheapest model isn't always the best value — capability per dollar matters more than raw token count.
How we calculated tokens per dollar
The math is straightforward: divide 1,000,000 by the price per million tokens. That gives you tokens per dollar.
For a model charging $0.10 per million input tokens:
- 1,000,000 ÷ 0.10 = 10,000,000 tokens per dollar
For output tokens at $0.40 per million:
- 1,000,000 ÷ 0.40 = 2,500,000 tokens per dollar
Every number in this article uses live pricing from our AI cost calculator. We update pricing weekly as providers change rates.
The budget tier: 10M+ input tokens per dollar
These models give you the most raw throughput. They're ideal for high-volume tasks where you need to process massive amounts of text cheaply — classification, extraction, summarization, embeddings preprocessing.
[stat] 20,000,000 Input tokens per $1 with GPT-5 Nano — the most tokens-per-dollar of any major model
GPT-5 Nano — $0.05 input / $0.40 output per 1M tokens
- Input: 20,000,000 tokens per $1
- Output: 2,500,000 tokens per $1
- OpenAI's smallest model. Good for classification, routing, and simple extraction. Not suitable for complex reasoning.
Mistral Small 3.2 — $0.06 input / $0.18 output per 1M tokens
- Input: 16,666,667 tokens per $1
- Output: 5,555,556 tokens per $1
- Best output-tokens-per-dollar in this tier. If your workload is output-heavy (generation, rewriting), Mistral Small punches above its weight.
Gemini 2.0 Flash Lite — $0.075 input / $0.30 per 1M tokens
- Input: 13,333,333 tokens per $1
- Output: 3,333,333 tokens per $1
- Google's lightweight option with a massive 1M context window. Ideal for processing long documents on a budget.
GPT-4.1 Nano — $0.10 input / $0.40 output per 1M tokens
- Input: 10,000,000 tokens per $1
- Output: 2,500,000 tokens per $1
- Previous-gen nano model. Still solid for simple tasks but GPT-5 Nano is cheaper and newer.
💡 Key Takeaway: Budget models give you 10-20 million input tokens per dollar. The tradeoff is capability — these models struggle with nuanced reasoning, complex code generation, and multi-step planning. Use them for the 80% of tasks that don't need a premium brain.
The mid-range tier: 1M–10M input tokens per dollar
This is where most production workloads live. These models balance cost with genuine capability — they can handle customer support, content generation, code assistance, and structured data extraction without breaking the bank.
Gemini 2.5 Flash — $0.15 input / $0.60 output per 1M tokens
- Input: 6,666,667 tokens per $1
- Output: 1,666,667 tokens per $1
- Google's workhorse. 1M context window and strong reasoning for the price. Hard to beat for document-heavy workloads.
GPT-4o Mini — $0.15 input / $0.60 output per 1M tokens
- Input: 6,666,667 tokens per $1
- Output: 1,666,667 tokens per $1
- Same tokens-per-dollar as Gemini 2.5 Flash but with a 128K context limit. Excellent for chat and short-form tasks.
GPT-5 Mini — $0.25 input / $2.00 output per 1M tokens
- Input: 4,000,000 tokens per $1
- Output: 500,000 tokens per $1
- Notice the gap: input is cheap but output is 4x more expensive per token. If your use case generates long responses, GPT-5 Mini gets expensive fast.
[vs] 6.67M tokens | Gemini 2.5 Flash (input) || 500K tokens | GPT-5 Mini (output) Same dollar, wildly different value depending on whether you're reading or writing
DeepSeek V3.2 — $0.28 input / $0.42 output per 1M tokens
- Input: 3,571,429 tokens per $1
- Output: 2,380,952 tokens per $1
- DeepSeek's balanced pricing means the input/output gap is small. Great for conversational workloads where you're both reading and generating roughly equally.
Llama 4 Maverick — $0.27 input / $0.85 output per 1M tokens
- Input: 3,703,704 tokens per $1
- Output: 1,176,471 tokens per $1
- Meta's open model via Together AI. Self-hosting drops costs further, but you take on infrastructure complexity.
GPT-4.1 Mini — $0.40 input / $1.60 output per 1M tokens
- Input: 2,500,000 tokens per $1
- Output: 625,000 tokens per $1
- Solid all-rounder with 200K context. The 4:1 input/output price ratio is typical for OpenAI models.
Claude Haiku 4.5 — $1.00 input / $5.00 output per 1M tokens
- Input: 1,000,000 tokens per $1
- Output: 200,000 tokens per $1
- Anthropic's cheapest current model. One million tokens per dollar sounds reasonable until you compare it to the budget tier above. You're paying for Anthropic's safety training and instruction-following quality.
💡 Key Takeaway: Mid-range models vary 6x in tokens-per-dollar (6.67M down to 1M for input). The input/output price ratio matters more than the headline price — a model that's cheap on input but expensive on output will surprise you on generation-heavy workloads, which is why we track both views in our cost-per-million ranking.
The premium tier: under 1M input tokens per dollar
These are the flagship models. You're not buying tokens — you're buying intelligence. Complex reasoning, nuanced writing, advanced code generation, and multi-step problem solving. Use them strategically on tasks that actually need the capability.
Claude Sonnet 4.6 — $3.00 input / $15.00 output per 1M tokens
- Input: 333,333 tokens per $1
- Output: 66,667 tokens per $1
- Anthropic's balanced flagship. 1M context window. The go-to for production applications that need quality without Opus pricing.
GPT-5 — $1.00 input / $8.00 output per 1M tokens
- Input: 1,000,000 tokens per $1
- Output: 125,000 tokens per $1
- OpenAI's flagship is actually competitive on input pricing. The output cost is where it gets expensive. Compare carefully with Claude Sonnet 4.6.
Claude Opus 4.6 — $5.00 input / $25.00 output per 1M tokens
- Input: 200,000 tokens per $1
- Output: 40,000 tokens per $1
- The premium thinking model. Best-in-class for complex analysis and creative work. 5x more expensive than Sonnet per token.
Grok 4 — $3.00 input / $15.00 output per 1M tokens
- Input: 333,333 tokens per $1
- Output: 66,667 tokens per $1
- xAI's flagship matches Claude Sonnet on pricing. The differentiator is real-time data access and a different personality profile.
[stat] 40,000 Output tokens per $1 with Claude Opus 4.6 — that's roughly 30,000 words of generated text for a dollar
The reasoning tier: the most expensive tokens
Reasoning models (o-series, DeepSeek R1) generate internal "thinking" tokens that you pay for. These models solve harder problems but the token economics are different — a single complex query can consume 50,000+ thinking tokens.
o4-mini — $1.10 input / $4.40 output per 1M tokens
- Input: 909,091 tokens per $1
- Output: 227,273 tokens per $1
- The budget reasoning option. Thinking tokens are billed at the output rate, so a reasoning-heavy query might generate 20K thinking tokens + 2K visible output = 22K tokens billed at $4.40/M.
o3 — $2.00 input / $8.00 output per 1M tokens
- Input: 500,000 tokens per $1
- Output: 125,000 tokens per $1
- Full reasoning model. Heavy thinking tasks (math proofs, complex code debugging) can cost $0.10-0.50 per query.
o3-pro — $20.00 input / $80.00 output per 1M tokens
- Input: 50,000 tokens per $1
- Output: 12,500 tokens per $1
- The most expensive mainstream model. A single complex reasoning query can cost $1-5. Reserve this for tasks where correctness is worth any price — medical analysis, legal reasoning, complex financial modeling.
GPT-5.2 Pro — $21.00 input / $168.00 output per 1M tokens
- Input: 47,619 tokens per $1
- Output: 5,952 tokens per $1
- The priciest model in our database. Output tokens cost 28x more than Claude Sonnet. This is for mission-critical reasoning where the margin of error must be near zero.
⚠️ Warning: Reasoning model pricing is deceptive. The per-token rate looks manageable until you realize a single query can generate 10-100K thinking tokens. Always estimate total tokens (input + thinking + output) before committing to a reasoning model in production.
Input vs output: why the ratio matters
Most developers focus on the input price. That's a mistake. The input/output price ratio varies wildly across models and determines which workloads are actually cheap.
Output-heavy workloads (content generation, code writing, long-form answers):
- Best value: Mistral Small 3.2 — 3.3:1 input/output ratio
- Worst value: GPT-5.2 Pro — 8:1 ratio
Input-heavy workloads (classification, extraction, summarization of long docs):
- Best value: GPT-5 Nano — 20M tokens per dollar
- Context window matters: Gemini 2.0 Flash Lite handles 1M tokens natively
Balanced workloads (chatbots, Q&A, customer support):
- Best value: DeepSeek V3.2 — nearly equal input/output rates
- Runner-up: Llama 3.1 8B — flat $0.18 both ways
📊 Quick Math: A chatbot processing 1,000 conversations/day averaging 2,000 input + 500 output tokens each:
- DeepSeek V3.2: (2M × $0.28 + 0.5M × $0.42) / 1M = $0.77/day ($23/month)
- Claude Sonnet 4.6: (2M × $3.00 + 0.5M × $15.00) / 1M = $13.50/day ($405/month)
- Same workload, 17x cost difference.
Real-world scenarios: what $100/month buys you
Let's make this concrete. Here's what $100 per month gets you across different models and use cases.
Customer support chatbot
Average conversation: 3,000 input tokens, 800 output tokens
| Model | Conversations per $100 | Cost per conversation |
|---|---|---|
| GPT-5 Nano | ~588,000 | $0.00017 |
| Gemini 2.5 Flash | ~128,000 | $0.00078 |
| GPT-5 Mini | ~45,000 | $0.0022 |
| Claude Sonnet 4.6 | ~5,700 | $0.018 |
| Claude Opus 4.6 | ~2,600 | $0.038 |
Blog post generation
Average post: 500 input tokens (prompt), 4,000 output tokens
| Model | Posts per $100 | Cost per post |
|---|---|---|
| Mistral Small 3.2 | ~133,000 | $0.00075 |
| GPT-5 Mini | ~12,400 | $0.008 |
| Claude Sonnet 4.6 | ~1,650 | $0.060 |
| GPT-5.2 Pro | ~148 | $0.672 |
Code review agent
Average task: 15,000 input tokens (code + context), 3,000 output tokens
| Model | Reviews per $100 | Cost per review |
|---|---|---|
| DeepSeek V3.2 | ~18,500 | $0.0054 |
| GPT-4.1 | ~2,900 | $0.034 |
| Claude Opus 4.6 | ~1,052 | $0.095 |
💡 Key Takeaway: The gap between cheapest and most expensive is 100-1,000x for the same task. Start with the cheapest model that produces acceptable quality, then upgrade only the tasks that need it; our best budget AI models list is a good starting point. Most production systems should use 2-3 models at different tiers.
The strategy: tiered model routing
The smartest teams don't pick one model. They route requests to the cheapest model that can handle each task.
Tier 1 — Bulk processing (GPT-5 Nano, Mistral Small 3.2)
- Classification, tagging, simple extraction
- 10-20M tokens per dollar
- Handle 80% of requests
Tier 2 — Standard tasks (Gemini 2.5 Flash, GPT-5 Mini, DeepSeek V3.2)
- Customer support, content generation, code assistance
- 1-6M tokens per dollar
- Handle 15% of requests
Tier 3 — Complex reasoning (Claude Sonnet 4.6, GPT-5, o3)
- Multi-step analysis, creative writing, hard debugging
- 100K-500K tokens per dollar
- Handle 5% of requests
A well-designed routing system cuts costs 60-80% compared to running everything through a flagship model. Use our multi-model comparison tool to find the right mix for your workload.
Frequently asked questions
How many tokens is a typical word? One token is roughly 0.75 words in English. So 1,000 tokens ≈ 750 words. A 2,000-word blog post is about 2,700 tokens. A full novel (~80,000 words) is roughly 107,000 tokens.
Do cached/prompt-cached tokens change the math? Yes, significantly. OpenAI and Anthropic offer 50-90% discounts on cached input tokens. If you're sending the same system prompt repeatedly (common in production), your effective input cost drops dramatically. DeepSeek offers cache hits at $0.07/M — that's 14.3M cached tokens per dollar.
Are output tokens always more expensive than input? Almost always. The exception is Llama 3.1 models via Together AI, which charge flat rates for both. For most providers, output costs 3-8x more than input because generation requires sequential computation while input processing can be parallelized.
Should I switch models to save money? Only if quality stays acceptable. Run an eval first: take 100 representative queries, run them through both models, and compare output quality. A 10x cheaper model that produces 20% worse results might cost you more in user churn than you save on API bills.
What's the cheapest way to process millions of documents? Use OpenAI's Batch API for 50% off, or self-host Llama 4 Maverick. For extraction tasks, GPT-5 Nano at $0.05/M input tokens processes 1 million 1,000-token documents for $50.
Pricing data current as of February 2026. Check our AI Cost Calculator for real-time pricing and run your own cost estimates. Prices change frequently — we update weekly.
