If you've looked at any AI API pricing page, you've seen something like "$2.50 per million input tokens." But what does that actually mean for your project? How many tokens is a typical request? And why do output tokens cost more than input?
This guide breaks down token-based pricing from the ground up, with real numbers from every major provider, practical cost calculations, and the optimization tactics that will save you money from day one.
[stat] $0.06 to $168.00 The range of per-million-token pricing across major AI APIs in February 2026 — a 2,800× spread
What is a token?
A token is a chunk of text that AI models process as a single unit. In English, a token is roughly 4 characters or ¾ of a word. The sentence "Hello, how are you doing today?" is about 8 tokens. A full page of text is roughly 500–700 tokens.
Different languages tokenize differently. Chinese, Japanese, and Korean text often produces more tokens per character because the tokenizer breaks down complex characters into sub-units. Code tends to tokenize efficiently — variable names and common programming patterns are often single tokens.
Quick rules of thumb for English:
- 1 word ≈ 1.3 tokens
- 1 page ≈ 600 tokens
- 1,000 words ≈ 1,300 tokens
- A typical chat message ≈ 50–150 tokens
- A full blog post ≈ 2,600–3,900 tokens
- A book chapter ≈ 5,000–10,000 tokens
These are approximations. The exact count depends on the specific tokenizer each model uses. OpenAI, Anthropic, and Google all use slightly different tokenizers. Use our token counter tool to get exact counts for your text with any model's tokenizer.
💡 Key Takeaway: Don't estimate token counts from word counts alone. Paste your actual prompts into a tokenizer to get precise numbers. A 500-word prompt might be 600 tokens on one model and 700 on another.
Why input and output tokens have different prices
Every AI API charges separately for two types of tokens:
- Input tokens (also called "prompt tokens"): Everything you send to the model — your system prompt, user message, conversation history, retrieved documents, function definitions, and any other context.
- Output tokens (also called "completion tokens"): Everything the model generates in response — the answer, generated code, analysis, or any other content.
Output tokens almost always cost more — typically 2–8× the input price. Here's why:
Input processing is parallel. The model reads all input tokens simultaneously using matrix operations optimized for modern GPUs. This is computationally efficient.
Output generation is sequential. The model must generate output tokens one at a time, with each new token depending on all previous tokens. This autoregressive process is inherently slower and more compute-intensive. Each output token requires a full forward pass through the model.
The hardware economics are simple: generating 1 output token uses roughly 2–5× more GPU compute than processing 1 input token. Providers pass this cost through in their pricing.
Current pricing across major providers (February 2026)
Here's what every major model charges per million tokens, organized by tier:
Flagship models
| Model | Provider | Input $/M | Output $/M | Output Multiplier | Context |
|---|---|---|---|---|---|
| GPT-5.2 | OpenAI | $1.75 | $14.00 | 8.0× | 1M |
| Claude Opus 4.6 | Anthropic | $5.00 | $25.00 | 5.0× | 200K |
| Gemini 3 Pro | $2.00 | $12.00 | 6.0× | 2M | |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | 5.0× | 1M |
| GPT-5 | OpenAI | $1.25 | $10.00 | 8.0× | 1M |
| Mistral Large 3 | Mistral | $0.50 | $1.50 | 3.0× | 256K |
Efficient models
| Model | Provider | Input $/M | Output $/M | Output Multiplier | Context |
|---|---|---|---|---|---|
| GPT-5 nano | OpenAI | $0.05 | $0.40 | 8.0× | 128K |
| Mistral Small 3.2 | Mistral | $0.06 | $0.18 | 3.0× | 128K |
| Gemini 2.5 Flash | $0.15 | $0.60 | 4.0× | 1M | |
| GPT-5 mini | OpenAI | $0.25 | $2.00 | 8.0× | 500K |
| DeepSeek V3.2 | DeepSeek | $0.28 | $0.42 | 1.5× | 128K |
| Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 | 5.0× | 200K |
Reasoning models
| Model | Provider | Input $/M | Output $/M | Output Multiplier | Context |
|---|---|---|---|---|---|
| o3-pro | OpenAI | $20.00 | $80.00 | 4.0× | 1M |
| o3 | OpenAI | $2.00 | $8.00 | 4.0× | 1M |
| o4-mini | OpenAI | $1.10 | $4.40 | 4.0× | 2M |
| DeepSeek R1 V3.2 | DeepSeek | $0.28 | $0.42 | 1.5× | 128K |
See the full ranking of all 47+ models on our cost per million tokens page.
Notice the output multiplier column. DeepSeek is unusual — its output costs only 1.5× its input, compared to 5–8× for most providers. This makes DeepSeek dramatically cheaper for output-heavy workloads like chatbots and content generation.
How to estimate your costs: the formula
The cost of a single API request is:
Request cost = (input_tokens × input_price / 1,000,000)
+ (output_tokens × output_price / 1,000,000)
Monthly cost is:
Monthly cost = request_cost × requests_per_day × 30
Example 1: Customer support chatbot
You're building a chatbot that handles 1,000 conversations per day. Each conversation averages 500 input tokens (system prompt + user message + history) and 300 output tokens (response).
With GPT-5 ($1.25/$10.00 per M):
- Daily input: 1,000 × 500 = 500,000 tokens → $0.625
- Daily output: 1,000 × 300 = 300,000 tokens → $3.00
- Daily total: $3.63 → ~$109/month
With GPT-5 mini ($0.25/$2.00 per M):
- Daily input: $0.125
- Daily output: $0.60
- Daily total: $0.725 → ~$21.75/month
With DeepSeek V3.2 ($0.28/$0.42 per M):
- Daily input: $0.14
- Daily output: $0.126
- Daily total: $0.266 → ~$7.98/month
📊 Quick Math: The same chatbot workload costs $109/month on GPT-5, $21.75 on GPT-5 mini, or $7.98 on DeepSeek V3.2. That's a 14× difference for the same volume of conversations. Use our cost calculator to run your own numbers.
Example 2: Code generation pipeline
A development team running 5,000 code reviews per day, with 2,000 input tokens (code context) and 1,000 output tokens (review + suggestions).
| Model | Daily Cost | Monthly Cost |
|---|---|---|
| Claude Opus 4.6 ($5/$25) | $135.00 | $4,050 |
| GPT-5.2 ($1.75/$14) | $87.50 | $2,625 |
| GPT-5 ($1.25/$10) | $62.50 | $1,875 |
| Codestral ($0.30/$0.90) | $7.50 | $225 |
| DeepSeek V3.2 ($0.28/$0.42) | $4.90 | $147 |
Codestral — Mistral's purpose-built coding model — costs 18× less than Claude Opus 4.6 for the same volume. For a dedicated code review pipeline, that's $3,825/month in savings. Is Opus better at code review? Probably, for edge cases. But a smart team would use Codestral for routine reviews and escalate only the complex ones to Opus.
Example 3: High-volume classification
Processing 100,000 customer messages per day for sentiment classification. Input: 100 tokens per message. Output: 10 tokens (just a label).
| Model | Daily Cost | Monthly Cost |
|---|---|---|
| Mistral Small 3.2 ($0.06/$0.18) | $0.78 | $23 |
| GPT-5 nano ($0.05/$0.40) | $0.90 | $27 |
| Llama 3.1 8B ($0.18/$0.18) | $1.98 | $59 |
| GPT-4o mini ($0.15/$0.60) | $2.10 | $63 |
At this scale and simplicity, the model choice barely matters financially — they're all under $100/month. Pick whichever gives the best classification accuracy for your domain.
The five hidden costs that inflate your token bill
1. System prompts count as input tokens
Your system prompt is sent with every single request. A 500-token system prompt across 10,000 daily requests = 5 million extra input tokens per day. On GPT-5, that's $6.25/day or $187.50/month just for the system prompt.
Fix: Keep system prompts concise. Use prompt caching (available from OpenAI and Anthropic) to get 50–90% discounts on repeated prefixes.
2. Conversation history grows exponentially
In chat applications, you resend the full conversation each turn. By turn 5, you're sending the system prompt plus 4 previous exchanges. By turn 10, your input might exceed 3,000 tokens per request — most of it repeated context you've already paid for.
Fix: Implement conversation summarization. After 5 turns, summarize the history into 200–300 tokens instead of sending the full transcript. This alone can cut chat-application input costs by 60%.
3. Retries and error handling
Failed requests still consume tokens. The input tokens are billed even if the model returns an error or times out mid-generation. If your error rate is 3% and you retry twice per failure, you're paying 6% overhead.
Fix: Log retry costs separately. Implement circuit breakers. Consider falling back to a cheaper model on retry instead of repeating the same expensive call.
4. Thinking tokens on reasoning models
Reasoning models (o3, o4-mini, DeepSeek R1) generate hidden "thinking" tokens before producing your visible answer. These thinking tokens are billed as output tokens. A request that returns 500 visible tokens might actually consume 5,000+ output tokens including thinking.
Fix: Use the reasoning_effort parameter (low/medium/high) to control thinking depth. Only use reasoning models for tasks that genuinely require step-by-step logic. Read our reasoning model pricing guide for strategies.
5. Prompt caching discounts you might be missing
Both OpenAI and Anthropic offer significant discounts when you reuse the same prompt prefix across requests. Anthropic charges 90% less for cached input on Claude Sonnet models. If your system prompt stays the same across requests (which it usually does), you're leaving money on the table by not enabling caching.
Fix: Structure your prompts with a stable prefix (system prompt + tool definitions) and a variable suffix (user message). Enable caching on the prefix.
⚠️ Warning: Most developers underestimate hidden costs by 30–50%. A team budgeting $5,000/month based on raw token math typically spends $7,000–$8,000 once retries, context growth, and system prompt overhead are included. Read our full hidden costs breakdown before setting budgets.
Choosing the right pricing tier for your use case
| Your Use Case | Recommended Tier | Example Models | Why |
|---|---|---|---|
| Internal tools, prototypes | Efficient | GPT-5 nano, Mistral Small 3.2, DeepSeek V3.2 | Cost barely matters at low volume; quality is "good enough" |
| Customer-facing chatbots | Balanced/Efficient | GPT-5 mini, Gemini 2.5 Flash, Mistral Large 3 | Need quality + reasonable costs at scale |
| Complex analysis, coding | Flagship | GPT-5.2, Claude Sonnet 4.6, Gemini 3 Pro | Quality matters more than cost |
| High-volume classification | Ultra-budget | Mistral Small 3.2, GPT-5 nano, Llama 3.1 8B | Pennies per thousand requests |
| Reasoning-heavy tasks | Reasoning | o4-mini, DeepSeek R1, o3 | Worth the premium for accuracy |
| Long document processing | Large context | Gemini 2.5 Pro (2M), Grok 4.1 Fast (2M), o4-mini (2M) | Process everything in one call |
Browse models by category on our category page or compare specific models on our comparison pages.
Five quick wins to lower your token bill
1. Start with the cheapest model that works
Most developers overestimate the model tier they need. GPT-5 mini and DeepSeek V3.2 handle 80% of use cases at 5–25× less than flagships. Run a quality test on 50 representative prompts before committing to a premium model.
2. Cache aggressively
If your system prompt is the same across requests, enable prompt caching. Anthropic's cached input rate for Claude Sonnet models is 90% cheaper than fresh input. This alone can cut input costs in half for applications with long system prompts or tool definitions.
3. Set max_tokens
Don't let the model ramble. If you need 100-word answers, set max_tokens: 200. You won't pay for tokens it doesn't generate. This is the single easiest optimization — one line of code, immediate savings.
4. Compress your prompts
Replace 5 few-shot examples (1,000+ tokens) with a clear zero-shot instruction (100 tokens). Summarize retrieved documents before including them as context. Strip HTML, formatting, and boilerplate from any text you include in the prompt.
5. Implement model routing
Route simple tasks (classification, extraction, yes/no questions) to budget models ($0.06–$0.28/M input) and reserve expensive models for complex tasks. Our guide on cost optimization strategies walks through implementing a tiered routing system.
✅ TL;DR: Token pricing is simple in theory — input tokens + output tokens × price per million. In practice, output costs dominate (2–8× input), hidden costs add 30–50%, and model choice creates a 100×+ cost difference for the same workload. Start cheap, measure everything, and upgrade only when quality demands it.
Calculate your costs now
Don't guess — calculate. Our free AI cost calculator lets you compare any model, adjust usage patterns, and see monthly projections instantly. Plug in your input tokens, output tokens, and request volume to get exact numbers for your specific workload.
Or browse the full pricing table to find the cheapest model for your needs, and check our complete pricing guide for provider-by-provider breakdowns.
Frequently asked questions
How many tokens is 1,000 words?
Approximately 1,300 tokens in English. The exact count varies by tokenizer — OpenAI's tokenizer produces slightly different counts than Anthropic's or Google's. Code, technical text, and non-English languages can vary more. Use our token counter for precise measurements.
Why do AI APIs charge per token instead of per request?
Per-token pricing reflects the actual computational cost. A 50-token request uses far less GPU compute than a 50,000-token request. Per-request pricing would either overcharge simple requests or undercharge complex ones. Tokens provide a fair, granular billing unit.
What's the cheapest way to use AI APIs?
Use the cheapest model that meets your quality threshold — often Mistral Small 3.2 ($0.06/$0.18) or DeepSeek V3.2 ($0.28/$0.42). Combine this with prompt caching, output length limits, and model routing for maximum savings. A well-optimized setup can cut costs by 50–80% compared to a naive implementation.
Do I pay for tokens in failed API requests?
Yes. Input tokens and any partial output generated before the failure are billed at full price. With a 5% error rate, you're wasting 5% of your budget on requests that produced no usable result. Track error rates and implement circuit breakers to minimize this waste.
How much does it cost to build a ChatGPT-like app?
At 10,000 conversations per day with an average of 5 turns each (2,000 input + 200 output tokens per turn), monthly costs range from $80 on DeepSeek V3.2 to $3,600 on GPT-5 to $15,000 on Claude Opus 4.6. The model tier you choose is by far the biggest cost lever. Start with our cost estimation guide for a complete budgeting framework.
