If you've looked at AI API pricing pages, you've seen "per million tokens" everywhere. But what exactly is a token, and why does it matter for your budget?
This guide breaks down tokenization, shows you real token counts for common prompts, and explains how to compare pricing across providers without getting misled. By the end, you'll know how to estimate costs for any AI workload with confidence.
[stat] ~0.75 words The average length of one token in English text. 1,000 tokens ≈ 750 words.
What is a token?
A token is the smallest unit of text that an AI model processes. It's not quite a word, and it's not quite a character. Think of it as a chunk of text that the model reads and generates.
In English, one token is roughly 0.75 words. So 100 tokens is about 75 words, and 1,000 tokens is around 750 words. A typical page of text (single-spaced) is about 500–600 words, or roughly 700–800 tokens.
The exact token count depends on the complexity of the text. Common words like "the" or "is" are usually one token. Longer or uncommon words might be split into multiple tokens. Numbers, punctuation, and special characters also affect the count.
Some useful benchmarks:
- A tweet (280 characters): ~50–70 tokens
- A short email: ~100–200 tokens
- A full page of text: ~700–800 tokens
- A 2,000-word blog post: ~2,700 tokens
- A novel (80,000 words): ~107,000 tokens
How tokenization works
AI models don't read text the way humans do. Before processing your prompt, the model breaks it into tokens using a tokenizer — a specific algorithm that maps text to numerical IDs. Each token is converted into a number that the model can understand.
For example, the sentence "AI pricing is confusing" might tokenize as:
- "AI" → 1 token
- " pricing" → 1 token (note the space is included)
- " is" → 1 token
- " confusing" → 1 token
Total: 4 tokens for 4 words. Clean and efficient.
But a more complex sentence like "The DeepSeek-V3 model costs $0.28/1M tokens" might tokenize as:
- "The" → 1 token
- " Deep" → 1 token
- "Se" → 1 token
- "ek" → 1 token
- "-V" → 1 token
- "3" → 1 token
- " model" → 1 token
- " costs" → 1 token
- " $" → 1 token
- "0" → 1 token
- "." → 1 token
- "28" → 1 token
- "/" → 1 token
- "1" → 1 token
- "M" → 1 token
- " tokens" → 1 token
Total: 16 tokens for 7 words. That's because the tokenizer splits technical terms, numbers, and symbols into smaller pieces. This is why code and data-heavy prompts use more tokens than plain English text — and cost more to process.
💡 Key Takeaway: Technical content (code, JSON, URLs, numbers) uses significantly more tokens per word than plain English. A 500-word code snippet may consume 1,000+ tokens, while 500 words of prose uses only ~670 tokens. Factor this into your cost estimates.
Why pricing is per token
Tokens determine the computational cost. Every token requires processing power — both for reading your input (prompt) and generating output (completion).
Providers charge separately for input tokens and output tokens because generating output is more expensive. The model has to predict each token one at a time (sequentially), while input tokens are processed in parallel. This fundamental asymmetry is why output always costs more.
For example, GPT-5 Mini costs $0.25 per million input tokens and $2.00 per million output tokens. Output costs 8× more than input.
This split matters enormously when you estimate costs:
- A chatbot that generates long responses will spend most of its budget on output tokens
- A summarization tool that reads long documents and outputs short summaries will spend more on input
- A classification pipeline with long inputs and one-word outputs is almost entirely an input cost
Understanding this split lets you pick the right model. For output-heavy workloads, prioritize low output prices. For input-heavy workloads (like RAG applications), focus on input costs. Read our guide on RAG application costs for more on this.
Real token counts for common prompts
Here are typical examples with approximate token counts to help you calibrate your estimates:
Short prompt (~15 tokens): "Write a product description for a coffee mug."
Medium prompt (~75 tokens): "You are a helpful customer service agent. A customer is asking about our return policy. Our policy allows returns within 30 days with a receipt. Respond professionally and explain the policy clearly."
Long prompt (~300 tokens): A detailed blog outline with context, instructions, tone guidelines, and example structure. This is common in content generation workflows.
System prompt (~500–2,000 tokens): Many production applications use detailed system prompts that include persona descriptions, tool definitions, response formatting rules, and few-shot examples. These tokens are sent on every single request and can dominate your input costs.
Short output (~50 tokens): A brief answer, a single paragraph, or a short list.
Medium output (~200 tokens): A few paragraphs, a code snippet with explanation, or a detailed response.
Long output (~1,000 tokens): A full article section, a long-form answer, or multiple code examples.
⚠️ Warning: System prompts are the most commonly overlooked cost. A 1,500-token system prompt sent on every request adds up fast: at 100K requests/month on Claude Sonnet 4.6 ($3/M input), that system prompt alone costs $450/month. Use prompt caching (available from OpenAI and Anthropic) to slash this by 50–90%.
Comparing token prices across providers
Token pricing varies widely. Here's a snapshot of popular models using verified data from our calculator:
Budget tier (efficient models):
| Model | Input $/1M | Output $/1M | Provider |
|---|---|---|---|
| GPT-5 Nano | $0.05 | $0.40 | OpenAI |
| Mistral Small 3.2 | $0.06 | $0.18 | Mistral |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | |
| GPT-4o mini | $0.15 | $0.60 | OpenAI |
| DeepSeek V3.2 | $0.28 | $0.42 | DeepSeek |
Mid tier (balanced models):
| Model | Input $/1M | Output $/1M | Provider |
|---|---|---|---|
| Gemini 2.5 Flash | $0.15 | $0.60 | |
| GPT-5 Mini | $0.25 | $2.00 | OpenAI |
| Claude Haiku 4.5 | $1.00 | $5.00 | Anthropic |
| Gemini 3 Flash | $0.50 | $3.00 |
Premium tier (flagship models):
| Model | Input $/1M | Output $/1M | Provider |
|---|---|---|---|
| GPT-5.2 | $1.75 | $14.00 | OpenAI |
| Gemini 3 Pro | $2.00 | $12.00 | |
| Claude Sonnet 4.6 | $3.00 | $15.00 | Anthropic |
| Claude Opus 4.6 | $5.00 | $25.00 | Anthropic |
📊 Quick Math: The cheapest model (GPT-5 Nano at $0.05/$0.40) is 100× cheaper on input and 62× cheaper on output than the most expensive standard model (Claude Opus 4.6 at $5/$25). That's the difference between a $20/month chatbot and a $2,000/month chatbot doing the same volume.
Notice that output pricing is always higher — usually 3–8× the input price. When comparing models, focus on output costs if your use case generates long responses. For tasks like document summarization where input is large and output is small, prioritize input pricing. Our per-token pricing explainer goes deeper on this.
How to estimate your costs
Follow this four-step process:
Step 1: Measure a sample request
Run a typical prompt through your chosen model and check the token count. Most API responses include prompt_tokens and completion_tokens in the usage object. Run at least 10–20 sample requests to get a reliable average.
Step 2: Estimate your volume
How many requests will you make per day or month? Be realistic — and account for growth. If you're launching a chatbot, estimate conversations per user per day, then multiply by your user base.
Step 3: Calculate total tokens
Multiply average tokens per request by request volume. Keep input and output separate since they have different prices.
Step 4: Apply pricing
Multiply input tokens by input price, output tokens by output price, and sum them.
Worked example: Let's say you're building a chatbot with these assumptions:
- 1,000 conversations per day
- 500 input tokens per conversation (system prompt + user message)
- 300 output tokens per conversation
- Using GPT-5 Mini ($0.25 input / $2.00 output per 1M tokens)
Monthly cost:
- Input: 1,000 × 500 × 30 = 15M tokens → $3.75
- Output: 1,000 × 300 × 30 = 9M tokens → $18.00
- Total: $21.75/month
If you switch to DeepSeek V3.2 ($0.28 input / $0.42 output):
- Input: 15M tokens → $4.20
- Output: 9M tokens → $3.78
- Total: $7.98/month
That's 63% cheaper for the same workload — because DeepSeek's output pricing ($0.42/M) massively undercuts GPT-5 Mini ($2.00/M).
💡 Key Takeaway: Output cost usually dominates your bill. In the chatbot example above, output is 83% of the total cost on GPT-5 Mini. A model with cheap input but expensive output will cost more than one with balanced pricing. Always calculate both sides.
Common token pitfalls
Pitfall 1: Forgetting system prompt tokens
Your system prompt is sent on every request. A 1,000-token system prompt across 100,000 requests/month means 100 million extra input tokens — which costs $500/month on Claude Sonnet 4.6 or $5/month on GPT-5 Nano. Use prompt caching to reduce this.
Pitfall 2: Ignoring conversation history
Chat applications accumulate context. By the 10th message in a conversation, you might be sending 5,000+ tokens of history with each request. Either summarize old context or implement a sliding window to keep costs predictable.
Pitfall 3: Not setting max_tokens
Without an output limit, models can generate thousands of tokens when you only needed 100. Set max_tokens on every request to prevent runaway costs. A model generating 2,000 tokens when you needed 200 wastes 10× your output budget.
Pitfall 4: Underestimating code and JSON tokens
Code and structured data tokenize inefficiently. A JSON response with nested objects uses 2–3× more tokens than the same information in plain English. If your application generates structured output, measure actual token counts rather than estimating from word count.
For more cost traps, read our hidden costs of AI APIs guide.
Final thoughts
Tokens are the fundamental unit of AI API pricing. Understanding how tokenization works and how to estimate token counts lets you compare providers accurately and predict costs before you commit.
Output tokens cost more than input, so always factor in response length. Budget models like DeepSeek V3.2 ($0.28/$0.42) and GPT-5 Nano ($0.05/$0.40) can deliver massive savings if your use case doesn't require frontier intelligence.
✅ TL;DR: One token ≈ 0.75 words. Output tokens cost 3–8× more than input. System prompts are hidden cost multipliers — cache them. Always calculate input and output costs separately, and use our calculator to compare before committing.
If you want a quick comparison across models and providers, try the AI Cost Check calculator. Plug in your estimated input and output tokens, and see exactly how much each provider will charge per request and per month.
Ready to go deeper? Our guide on estimating AI API costs before building walks you through budgeting for a real app. And check the cheapest AI APIs in 2026 for a full price ranking.
Frequently asked questions
How many words is 1 million tokens?
Approximately 750,000 words in English, based on the average ratio of 0.75 words per token. That's roughly equivalent to 10 full-length novels or 1,500 blog posts. In practice, the ratio varies: plain English is closer to 0.75 words/token, while code and technical content may be 0.5 words/token or less.
Why do output tokens cost more than input tokens?
Output tokens require sequential generation — the model predicts one token at a time, each depending on the previous ones. Input tokens are processed in parallel, which is computationally cheaper. This fundamental difference means output is always 3–8× more expensive. On Claude Opus 4.6, output ($25/M) costs 5× more than input ($5/M).
How can I reduce my token costs?
Three high-impact strategies: 1) Use prompt caching (50–90% discount on repeated system prompts). 2) Set max_tokens on every request to prevent runaway generation. 3) Use a cheaper model for simple tasks and route only complex requests to expensive ones. Together, these can cut costs by 50–70%.
Do images and audio count as tokens?
Yes, but differently. Images are converted to tokens based on resolution — a typical image might be 500–1,500 tokens. Audio is tokenized by duration, roughly 1 token per 10ms of audio. These tokens are billed at the same input rate as text tokens. Check provider documentation for exact image/audio tokenization rates.
What's a good way to estimate tokens before building?
Use our AI Cost Check calculator to model your expected workload. Input your average prompt length, expected output length, and daily request volume. The calculator shows per-request and monthly costs across all major models. For a more detailed walkthrough, read our cost estimation guide.
