Skip to main content
February 21, 2026

AI API Pricing Per Token Explained: What You're Actually Paying For

What does 1 million tokens actually cost? From $0.07 (DeepSeek) to $75 (Claude Opus) — learn how token pricing works with real examples and a cost estimator.

pricingtokensbeginnerscost-optimization
AI API Pricing Per Token Explained: What You're Actually Paying For

If you've looked at any AI API pricing page, you've seen something like "$2.50 per million input tokens." But what does that actually mean for your project? How many tokens is a typical request? And why do output tokens cost more than input?

This guide breaks down token-based pricing from the ground up, with real numbers from every major provider, practical cost calculations, and the optimization tactics that will save you money from day one.

[stat] $0.06 to $168.00 The range of per-million-token pricing across major AI APIs in February 2026 — a 2,800× spread

What is a token?

A token is a chunk of text that AI models process as a single unit. In English, a token is roughly 4 characters or ¾ of a word. The sentence "Hello, how are you doing today?" is about 8 tokens. A full page of text is roughly 500–700 tokens.

Different languages tokenize differently. Chinese, Japanese, and Korean text often produces more tokens per character because the tokenizer breaks down complex characters into sub-units. Code tends to tokenize efficiently — variable names and common programming patterns are often single tokens.

Quick rules of thumb for English:

  • 1 word ≈ 1.3 tokens
  • 1 page ≈ 600 tokens
  • 1,000 words ≈ 1,300 tokens
  • A typical chat message ≈ 50–150 tokens
  • A full blog post ≈ 2,600–3,900 tokens
  • A book chapter ≈ 5,000–10,000 tokens

These are approximations. The exact count depends on the specific tokenizer each model uses. OpenAI, Anthropic, and Google all use slightly different tokenizers. Use our token counter tool to get exact counts for your text with any model's tokenizer.

💡 Key Takeaway: Don't estimate token counts from word counts alone. Paste your actual prompts into a tokenizer to get precise numbers. A 500-word prompt might be 600 tokens on one model and 700 on another.


Why input and output tokens have different prices

Every AI API charges separately for two types of tokens:

  • Input tokens (also called "prompt tokens"): Everything you send to the model — your system prompt, user message, conversation history, retrieved documents, function definitions, and any other context.
  • Output tokens (also called "completion tokens"): Everything the model generates in response — the answer, generated code, analysis, or any other content.

Output tokens almost always cost more — typically 2–8× the input price. Here's why:

Input processing is parallel. The model reads all input tokens simultaneously using matrix operations optimized for modern GPUs. This is computationally efficient.

Output generation is sequential. The model must generate output tokens one at a time, with each new token depending on all previous tokens. This autoregressive process is inherently slower and more compute-intensive. Each output token requires a full forward pass through the model.

The hardware economics are simple: generating 1 output token uses roughly 2–5× more GPU compute than processing 1 input token. Providers pass this cost through in their pricing.

Current pricing across major providers (February 2026)

Here's what every major model charges per million tokens, organized by tier:

Flagship models

Model Provider Input $/M Output $/M Output Multiplier Context
GPT-5.2 OpenAI $1.75 $14.00 8.0× 1M
Claude Opus 4.6 Anthropic $5.00 $25.00 5.0× 200K
Gemini 3 Pro Google $2.00 $12.00 6.0× 2M
Claude Sonnet 4.6 Anthropic $3.00 $15.00 5.0× 1M
GPT-5 OpenAI $1.25 $10.00 8.0× 1M
Mistral Large 3 Mistral $0.50 $1.50 3.0× 256K

Efficient models

Model Provider Input $/M Output $/M Output Multiplier Context
GPT-5 nano OpenAI $0.05 $0.40 8.0× 128K
Mistral Small 3.2 Mistral $0.06 $0.18 3.0× 128K
Gemini 2.5 Flash Google $0.15 $0.60 4.0× 1M
GPT-5 mini OpenAI $0.25 $2.00 8.0× 500K
DeepSeek V3.2 DeepSeek $0.28 $0.42 1.5× 128K
Claude Haiku 4.5 Anthropic $1.00 $5.00 5.0× 200K

Reasoning models

Model Provider Input $/M Output $/M Output Multiplier Context
o3-pro OpenAI $20.00 $80.00 4.0× 1M
o3 OpenAI $2.00 $8.00 4.0× 1M
o4-mini OpenAI $1.10 $4.40 4.0× 2M
DeepSeek R1 V3.2 DeepSeek $0.28 $0.42 1.5× 128K

See the full ranking of all 47+ models on our cost per million tokens page.

$0.42
DeepSeek V3.2 output per 1M
vs
$25.00
Claude Opus 4.6 output per 1M

Notice the output multiplier column. DeepSeek is unusual — its output costs only 1.5× its input, compared to 5–8× for most providers. This makes DeepSeek dramatically cheaper for output-heavy workloads like chatbots and content generation.


How to estimate your costs: the formula

The cost of a single API request is:

Request cost = (input_tokens × input_price / 1,000,000)
             + (output_tokens × output_price / 1,000,000)

Monthly cost is:

Monthly cost = request_cost × requests_per_day × 30

Example 1: Customer support chatbot

You're building a chatbot that handles 1,000 conversations per day. Each conversation averages 500 input tokens (system prompt + user message + history) and 300 output tokens (response).

With GPT-5 ($1.25/$10.00 per M):

  • Daily input: 1,000 × 500 = 500,000 tokens → $0.625
  • Daily output: 1,000 × 300 = 300,000 tokens → $3.00
  • Daily total: $3.63 → ~$109/month

With GPT-5 mini ($0.25/$2.00 per M):

  • Daily input: $0.125
  • Daily output: $0.60
  • Daily total: $0.725 → ~$21.75/month

With DeepSeek V3.2 ($0.28/$0.42 per M):

  • Daily input: $0.14
  • Daily output: $0.126
  • Daily total: $0.266 → ~$7.98/month

📊 Quick Math: The same chatbot workload costs $109/month on GPT-5, $21.75 on GPT-5 mini, or $7.98 on DeepSeek V3.2. That's a 14× difference for the same volume of conversations. Use our cost calculator to run your own numbers.

Example 2: Code generation pipeline

A development team running 5,000 code reviews per day, with 2,000 input tokens (code context) and 1,000 output tokens (review + suggestions).

Model Daily Cost Monthly Cost
Claude Opus 4.6 ($5/$25) $135.00 $4,050
GPT-5.2 ($1.75/$14) $87.50 $2,625
GPT-5 ($1.25/$10) $62.50 $1,875
Codestral ($0.30/$0.90) $7.50 $225
DeepSeek V3.2 ($0.28/$0.42) $4.90 $147

Codestral — Mistral's purpose-built coding model — costs 18× less than Claude Opus 4.6 for the same volume. For a dedicated code review pipeline, that's $3,825/month in savings. Is Opus better at code review? Probably, for edge cases. But a smart team would use Codestral for routine reviews and escalate only the complex ones to Opus.

Example 3: High-volume classification

Processing 100,000 customer messages per day for sentiment classification. Input: 100 tokens per message. Output: 10 tokens (just a label).

Model Daily Cost Monthly Cost
Mistral Small 3.2 ($0.06/$0.18) $0.78 $23
GPT-5 nano ($0.05/$0.40) $0.90 $27
Llama 3.1 8B ($0.18/$0.18) $1.98 $59
GPT-4o mini ($0.15/$0.60) $2.10 $63

At this scale and simplicity, the model choice barely matters financially — they're all under $100/month. Pick whichever gives the best classification accuracy for your domain.


The five hidden costs that inflate your token bill

1. System prompts count as input tokens

Your system prompt is sent with every single request. A 500-token system prompt across 10,000 daily requests = 5 million extra input tokens per day. On GPT-5, that's $6.25/day or $187.50/month just for the system prompt.

Fix: Keep system prompts concise. Use prompt caching (available from OpenAI and Anthropic) to get 50–90% discounts on repeated prefixes.

2. Conversation history grows exponentially

In chat applications, you resend the full conversation each turn. By turn 5, you're sending the system prompt plus 4 previous exchanges. By turn 10, your input might exceed 3,000 tokens per request — most of it repeated context you've already paid for.

Fix: Implement conversation summarization. After 5 turns, summarize the history into 200–300 tokens instead of sending the full transcript. This alone can cut chat-application input costs by 60%.

3. Retries and error handling

Failed requests still consume tokens. The input tokens are billed even if the model returns an error or times out mid-generation. If your error rate is 3% and you retry twice per failure, you're paying 6% overhead.

Fix: Log retry costs separately. Implement circuit breakers. Consider falling back to a cheaper model on retry instead of repeating the same expensive call.

4. Thinking tokens on reasoning models

Reasoning models (o3, o4-mini, DeepSeek R1) generate hidden "thinking" tokens before producing your visible answer. These thinking tokens are billed as output tokens. A request that returns 500 visible tokens might actually consume 5,000+ output tokens including thinking.

Fix: Use the reasoning_effort parameter (low/medium/high) to control thinking depth. Only use reasoning models for tasks that genuinely require step-by-step logic. Read our reasoning model pricing guide for strategies.

5. Prompt caching discounts you might be missing

Both OpenAI and Anthropic offer significant discounts when you reuse the same prompt prefix across requests. Anthropic charges 90% less for cached input on Claude Sonnet models. If your system prompt stays the same across requests (which it usually does), you're leaving money on the table by not enabling caching.

Fix: Structure your prompts with a stable prefix (system prompt + tool definitions) and a variable suffix (user message). Enable caching on the prefix.

⚠️ Warning: Most developers underestimate hidden costs by 30–50%. A team budgeting $5,000/month based on raw token math typically spends $7,000–$8,000 once retries, context growth, and system prompt overhead are included. Read our full hidden costs breakdown before setting budgets.


Choosing the right pricing tier for your use case

Your Use Case Recommended Tier Example Models Why
Internal tools, prototypes Efficient GPT-5 nano, Mistral Small 3.2, DeepSeek V3.2 Cost barely matters at low volume; quality is "good enough"
Customer-facing chatbots Balanced/Efficient GPT-5 mini, Gemini 2.5 Flash, Mistral Large 3 Need quality + reasonable costs at scale
Complex analysis, coding Flagship GPT-5.2, Claude Sonnet 4.6, Gemini 3 Pro Quality matters more than cost
High-volume classification Ultra-budget Mistral Small 3.2, GPT-5 nano, Llama 3.1 8B Pennies per thousand requests
Reasoning-heavy tasks Reasoning o4-mini, DeepSeek R1, o3 Worth the premium for accuracy
Long document processing Large context Gemini 2.5 Pro (2M), Grok 4.1 Fast (2M), o4-mini (2M) Process everything in one call

Browse models by category on our category page or compare specific models on our comparison pages.


Five quick wins to lower your token bill

1. Start with the cheapest model that works

Most developers overestimate the model tier they need. GPT-5 mini and DeepSeek V3.2 handle 80% of use cases at 5–25× less than flagships. Run a quality test on 50 representative prompts before committing to a premium model.

2. Cache aggressively

If your system prompt is the same across requests, enable prompt caching. Anthropic's cached input rate for Claude Sonnet models is 90% cheaper than fresh input. This alone can cut input costs in half for applications with long system prompts or tool definitions.

3. Set max_tokens

Don't let the model ramble. If you need 100-word answers, set max_tokens: 200. You won't pay for tokens it doesn't generate. This is the single easiest optimization — one line of code, immediate savings.

4. Compress your prompts

Replace 5 few-shot examples (1,000+ tokens) with a clear zero-shot instruction (100 tokens). Summarize retrieved documents before including them as context. Strip HTML, formatting, and boilerplate from any text you include in the prompt.

5. Implement model routing

Route simple tasks (classification, extraction, yes/no questions) to budget models ($0.06–$0.28/M input) and reserve expensive models for complex tasks. Our guide on cost optimization strategies walks through implementing a tiered routing system.

✅ TL;DR: Token pricing is simple in theory — input tokens + output tokens × price per million. In practice, output costs dominate (2–8× input), hidden costs add 30–50%, and model choice creates a 100×+ cost difference for the same workload. Start cheap, measure everything, and upgrade only when quality demands it.


Calculate your costs now

Don't guess — calculate. Our free AI cost calculator lets you compare any model, adjust usage patterns, and see monthly projections instantly. Plug in your input tokens, output tokens, and request volume to get exact numbers for your specific workload.

Or browse the full pricing table to find the cheapest model for your needs, and check our complete pricing guide for provider-by-provider breakdowns.


Frequently asked questions

How many tokens is 1,000 words?

Approximately 1,300 tokens in English. The exact count varies by tokenizer — OpenAI's tokenizer produces slightly different counts than Anthropic's or Google's. Code, technical text, and non-English languages can vary more. Use our token counter for precise measurements.

Why do AI APIs charge per token instead of per request?

Per-token pricing reflects the actual computational cost. A 50-token request uses far less GPU compute than a 50,000-token request. Per-request pricing would either overcharge simple requests or undercharge complex ones. Tokens provide a fair, granular billing unit.

What's the cheapest way to use AI APIs?

Use the cheapest model that meets your quality threshold — often Mistral Small 3.2 ($0.06/$0.18) or DeepSeek V3.2 ($0.28/$0.42). Combine this with prompt caching, output length limits, and model routing for maximum savings. A well-optimized setup can cut costs by 50–80% compared to a naive implementation.

Do I pay for tokens in failed API requests?

Yes. Input tokens and any partial output generated before the failure are billed at full price. With a 5% error rate, you're wasting 5% of your budget on requests that produced no usable result. Track error rates and implement circuit breakers to minimize this waste.

How much does it cost to build a ChatGPT-like app?

At 10,000 conversations per day with an average of 5 turns each (2,000 input + 200 output tokens per turn), monthly costs range from $80 on DeepSeek V3.2 to $3,600 on GPT-5 to $15,000 on Claude Opus 4.6. The model tier you choose is by far the biggest cost lever. Start with our cost estimation guide for a complete budgeting framework.