How many words are in one AI token?

This guide uses the common rule that 1 token is about 0.75 English words. So 1,000 tokens is roughly 750 words, and 1 million tokens is about 750,000 words depending on text type.

What is a simple way to estimate monthly API cost?

Separate input and output: total input tokens times input rate plus total output tokens times output rate. The worked chatbot example (1,000 chats/day, 500 input, 300 output) lands at $21.75/month on GPT-5 Mini and $7.98/month on DeepSeek V3.2.

What common token mistakes inflate beginner budgets?

The post highlights four: forgetting repeated system prompt tokens, letting conversation history grow unchecked, not setting max output limits, and underestimating token-heavy code or JSON. It notes a 1,500-token system prompt at 100K monthly calls can cost $450/month on Claude Sonnet 4.6 input rates.

Published February 16, 2026

What Are AI Tokens? A Beginner's Guide to Token Pricing

Understanding how AI APIs charge per token, what tokens actually are, and how to estimate costs for your use case.

tokensbeginnerpricing-guide2026

What Are AI Tokens? A Beginner's Guide to Token Pricing

If you've looked at AI API pricing pages, you've seen "per million tokens" everywhere. But what exactly is a token, and why does it matter for your budget?

This guide breaks down tokenization, shows you real token counts for common prompts, and explains how to compare pricing across providers without getting misled. By the end, you'll know how to estimate costs for any AI workload with confidence.

[stat] ~0.75 words The average length of one token in English text. 1,000 tokens ≈ 750 words.

What is a token?

A token is the smallest unit of text that an AI model processes. It's not quite a word, and it's not quite a character. Think of it as a chunk of text that the model reads and generates.

In English, one token is roughly 0.75 words. So 100 tokens is about 75 words, and 1,000 tokens is around 750 words. A typical page of text (single-spaced) is about 500–600 words, or roughly 700–800 tokens.

The exact token count depends on the complexity of the text. Common words like "the" or "is" are usually one token. Longer or uncommon words might be split into multiple tokens. Numbers, punctuation, and special characters also affect the count.

Some useful benchmarks:

A tweet (280 characters): ~50–70 tokens
A short email: ~100–200 tokens
A full page of text: ~700–800 tokens
A 2,000-word blog post: ~2,700 tokens
A novel (80,000 words): ~107,000 tokens

How tokenization works

AI models don't read text the way humans do. Before processing your prompt, the model breaks it into tokens using a tokenizer — a specific algorithm that maps text to numerical IDs. Each token is converted into a number that the model can understand.

For example, the sentence "AI pricing is confusing" might tokenize as:

"AI" → 1 token
" pricing" → 1 token (note the space is included)
" is" → 1 token
" confusing" → 1 token

Total: 4 tokens for 4 words. Clean and efficient.

But a more complex sentence like "The DeepSeek-V3 model costs $0.28/1M tokens" might tokenize as:

"The" → 1 token
" Deep" → 1 token
"Se" → 1 token
"ek" → 1 token
"-V" → 1 token
"3" → 1 token
" model" → 1 token
" costs" → 1 token
" $" → 1 token
"0" → 1 token
"." → 1 token
"28" → 1 token
"/" → 1 token
"1" → 1 token
"M" → 1 token
" tokens" → 1 token

Total: 16 tokens for 7 words. That's because the tokenizer splits technical terms, numbers, and symbols into smaller pieces. This is why code and data-heavy prompts use more tokens than plain English text — and cost more to process.

💡 Key Takeaway: Technical content (code, JSON, URLs, numbers) uses significantly more tokens per word than plain English. A 500-word code snippet may consume 1,000+ tokens, while 500 words of prose uses only ~670 tokens. Factor this into your cost estimates.

Why pricing is per token

Tokens determine the computational cost. Every token requires processing power — both for reading your input (prompt) and generating output (completion).

Providers charge separately for input tokens and output tokens because generating output is more expensive. The model has to predict each token one at a time (sequentially), while input tokens are processed in parallel. This fundamental asymmetry is why output always costs more.

For example, GPT-5 Mini costs $0.25 per million input tokens and $2.00 per million output tokens. Output costs 8× more than input.

$0.25

GPT-5 Mini input per 1M tokens

$2.00

GPT-5 Mini output per 1M tokens

This split matters enormously when you estimate costs:

A chatbot that generates long responses will spend most of its budget on output tokens
A summarization tool that reads long documents and outputs short summaries will spend more on input
A classification pipeline with long inputs and one-word outputs is almost entirely an input cost

Understanding this split lets you pick the right model. For output-heavy workloads, prioritize low output prices. For input-heavy workloads (like RAG applications), focus on input costs. Read our guide on RAG application costs for more on this.

Real token counts for common prompts

Here are typical examples with approximate token counts to help you calibrate your estimates:

Short prompt (~15 tokens): "Write a product description for a coffee mug."

Medium prompt (~75 tokens): "You are a helpful customer service agent. A customer is asking about our return policy. Our policy allows returns within 30 days with a receipt. Respond professionally and explain the policy clearly."

Long prompt (~300 tokens): A detailed blog outline with context, instructions, tone guidelines, and example structure. This is common in content generation workflows.

System prompt (~500–2,000 tokens): Many production applications use detailed system prompts that include persona descriptions, tool definitions, response formatting rules, and few-shot examples. These tokens are sent on every single request and can dominate your input costs.

Short output (~50 tokens): A brief answer, a single paragraph, or a short list.

Medium output (~200 tokens): A few paragraphs, a code snippet with explanation, or a detailed response.

Long output (~1,000 tokens): A full article section, a long-form answer, or multiple code examples.

⚠️ Warning: System prompts are the most commonly overlooked cost. A 1,500-token system prompt sent on every request adds up fast: at 100K requests/month on Claude Sonnet 4.6 ($3/M input), that system prompt alone costs $450/month. Use prompt caching (available from OpenAI and Anthropic) to slash this by 50–90%.

Comparing token prices across providers

Token pricing varies widely. Here's a snapshot of popular models using verified data from our calculator:

Budget tier (efficient models):

Model	Input $/1M	Output $/1M	Provider
GPT-5 Nano	$0.05	$0.40	OpenAI
Mistral Small 3.2	$0.06	$0.18	Mistral
Gemini 2.5 Flash-Lite	$0.10	$0.40	Google
GPT-4o mini	$0.15	$0.60	OpenAI
DeepSeek V3.2	$0.28	$0.42	DeepSeek

Mid tier (balanced models):

Model	Input $/1M	Output $/1M	Provider
Gemini 2.5 Flash	$0.15	$0.60	Google
GPT-5 Mini	$0.25	$2.00	OpenAI
Claude Haiku 4.5	$1.00	$5.00	Anthropic
Gemini 3 Flash	$0.50	$3.00	Google

Premium tier (flagship models):

Model	Input $/1M	Output $/1M	Provider
GPT-5.2	$1.75	$14.00	OpenAI
Gemini 3 Pro	$2.00	$12.00	Google
Claude Sonnet 4.6	$3.00	$15.00	Anthropic
Claude Opus 4.6	$5.00	$25.00	Anthropic

📊 Quick Math: The cheapest model (GPT-5 Nano at $0.05/$0.40) is 100× cheaper on input and 62× cheaper on output than the most expensive standard model (Claude Opus 4.6 at $5/$25). That's the difference between a $20/month chatbot and a $2,000/month chatbot doing the same volume.

Notice that output pricing is always higher — usually 3–8× the input price. When comparing models, focus on output costs if your use case generates long responses. For tasks like document summarization where input is large and output is small, prioritize input pricing. Our per-token pricing explainer goes deeper on this.

How to estimate your costs

Follow this four-step process:

Step 1: Measure a sample request

Run a typical prompt through your chosen model and check the token count. Most API responses include prompt_tokens and completion_tokens in the usage object. Run at least 10–20 sample requests to get a reliable average.

Step 2: Estimate your volume

How many requests will you make per day or month? Be realistic — and account for growth. If you're launching a chatbot, estimate conversations per user per day, then multiply by your user base.

Step 3: Calculate total tokens

Multiply average tokens per request by request volume. Keep input and output separate since they have different prices.

Step 4: Apply pricing

Multiply input tokens by input price, output tokens by output price, and sum them.

Worked example: Let's say you're building a chatbot with these assumptions:

1,000 conversations per day
500 input tokens per conversation (system prompt + user message)
300 output tokens per conversation
Using GPT-5 Mini ($0.25 input / $2.00 output per 1M tokens)

Monthly cost:

Input: 1,000 × 500 × 30 = 15M tokens → $3.75
Output: 1,000 × 300 × 30 = 9M tokens → $18.00
Total: $21.75/month

If you switch to DeepSeek V3.2 ($0.28 input / $0.42 output):

Input: 15M tokens → $4.20
Output: 9M tokens → $3.78
Total: $7.98/month

That's 63% cheaper for the same workload — because DeepSeek's output pricing ($0.42/M) massively undercuts GPT-5 Mini ($2.00/M).

💡 Key Takeaway: Output cost usually dominates your bill. In the chatbot example above, output is 83% of the total cost on GPT-5 Mini. A model with cheap input but expensive output will cost more than one with balanced pricing. Always calculate both sides.

Common token pitfalls

Pitfall 1: Forgetting system prompt tokens

Your system prompt is sent on every request. A 1,000-token system prompt across 100,000 requests/month means 100 million extra input tokens — which costs $500/month on Claude Sonnet 4.6 or $5/month on GPT-5 Nano. Use prompt caching to reduce this.

Pitfall 2: Ignoring conversation history

Chat applications accumulate context. By the 10th message in a conversation, you might be sending 5,000+ tokens of history with each request. Either summarize old context or implement a sliding window to keep costs predictable.

Pitfall 3: Not setting max_tokens

Without an output limit, models can generate thousands of tokens when you only needed 100. Set max_tokens on every request to prevent runaway costs. A model generating 2,000 tokens when you needed 200 wastes 10× your output budget.

Pitfall 4: Underestimating code and JSON tokens

Code and structured data tokenize inefficiently. A JSON response with nested objects uses 2–3× more tokens than the same information in plain English. If your application generates structured output, measure actual token counts rather than estimating from word count.

For more cost traps, read our hidden costs of AI APIs guide.

Final thoughts

Tokens are the fundamental unit of AI API pricing. Understanding how tokenization works and how to estimate token counts lets you compare providers accurately and predict costs before you commit.

Output tokens cost more than input, so always factor in response length. Budget models like DeepSeek V3.2 ($0.28/$0.42) and GPT-5 Nano ($0.05/$0.40) can deliver massive savings if your use case doesn't require frontier intelligence.

✅ TL;DR: One token ≈ 0.75 words. Output tokens cost 3–8× more than input. System prompts are hidden cost multipliers — cache them. Always calculate input and output costs separately, and use our calculator to compare before committing.

If you want a quick comparison across models and providers, try the AI Cost Check calculator. Plug in your estimated input and output tokens, and see exactly how much each provider will charge per request and per month.

Ready to go deeper? Our guide on estimating AI API costs before building walks you through budgeting for a real app. And check the cheapest AI APIs in 2026 for a full price ranking.

Frequently asked questions

How many words is 1 million tokens?

Approximately 750,000 words in English, based on the average ratio of 0.75 words per token. That's roughly equivalent to 10 full-length novels or 1,500 blog posts. In practice, the ratio varies: plain English is closer to 0.75 words/token, while code and technical content may be 0.5 words/token or less.

Why do output tokens cost more than input tokens?

Output tokens require sequential generation — the model predicts one token at a time, each depending on the previous ones. Input tokens are processed in parallel, which is computationally cheaper. This fundamental difference means output is always 3–8× more expensive. On Claude Opus 4.6, output ($25/M) costs 5× more than input ($5/M).

How can I reduce my token costs?

Three high-impact strategies: 1) Use prompt caching (50–90% discount on repeated system prompts). 2) Set max_tokens on every request to prevent runaway generation. 3) Use a cheaper model for simple tasks and route only complex requests to expensive ones. Together, these can cut costs by 50–70%.

Do images and audio count as tokens?

Yes, but differently. Images are converted to tokens based on resolution — a typical image might be 500–1,500 tokens. Audio is tokenized by duration, roughly 1 token per 10ms of audio. These tokens are billed at the same input rate as text tokens. Check provider documentation for exact image/audio tokenization rates.

What's a good way to estimate tokens before building?

Use our AI Cost Check calculator to model your expected workload. Input your average prompt length, expected output length, and daily request volume. The calculator shows per-request and monthly costs across all major models. For a more detailed walkthrough, read our cost estimation guide.

Related Cost Guides

Keep going with the closest pricing and optimization guides in this cluster.

What Are AI Tokens? A Beginner's Guide to Token Pricing

What is a token?

How tokenization works

Why pricing is per token

Real token counts for common prompts

Comparing token prices across providers

How to estimate your costs

Step 1: Measure a sample request

Step 2: Estimate your volume

Step 3: Calculate total tokens

Step 4: Apply pricing

Common token pitfalls

Pitfall 1: Forgetting system prompt tokens

Pitfall 2: Ignoring conversation history

Pitfall 3: Not setting max_tokens

Pitfall 4: Underestimating code and JSON tokens

Final thoughts

Frequently asked questions

How many words is 1 million tokens?

Why do output tokens cost more than input tokens?

How can I reduce my token costs?

Do images and audio count as tokens?

What's a good way to estimate tokens before building?

Related Cost Guides

GPT-5.5 Pricing Guide 2026: Real Cost Math, Best Use Cases, and When It Beats GPT-5 Mini or Claude

DeepSeek V4 Pricing Guide 2026: Flash vs Pro, V3.2, and When the Upgrade Is Worth It

Claude Opus 4.7 Pricing Guide in 2026: Cost Per Million Tokens, Real-World Workload Math, and When It Pays Off