AI API pricing in 2026 is both competitive and confusing. Eight major providers offer 40+ models across four distinct pricing tiers, with per-token rates ranging from $0.05 to $168 per million tokens. Every provider lists per-token rates, but the real cost depends on output volume, context size, model tier, and how much you can optimize prompts.
This guide summarizes pricing across every major provider using real data from our calculator, explains how to compare them accurately, and gives you a framework for picking the right model without overpaying.
[stat] 8 providers, 47+ models The 2026 AI API market offers more choice — and more pricing complexity — than ever before
The pricing basics
Most providers charge per million tokens, split into input (prompt) and output (completion). Output is almost always more expensive — typically 2–8× the input price. Total cost is driven by three things:
- How many tokens you send and receive per request
- How many requests you make per day/month
- Which model tier you choose
That sounds simple, but a small change in output length can dwarf the input cost. A model that generates 500-token responses costs 2.5× more in output than one generating 200-token responses — regardless of input pricing. Always estimate total output tokens when comparing models.
💡 Key Takeaway: Output tokens are the dominant cost driver for most applications. A model with cheap input but expensive output (like GPT-5 at $1.25/$10.00) can cost more than a model with higher input but lower output pricing, depending on your workload.
Provider-by-provider breakdown
OpenAI
OpenAI offers the broadest lineup, from ultra-cheap nano models to premium reasoning engines. Their pricing spans a 3,360× range from GPT-5 nano to GPT-5.2 pro.
| Model | Input/1M | Output/1M | Context | Category |
|---|---|---|---|---|
| GPT-5.2 pro | $21.00 | $168.00 | 1M | Reasoning |
| o3-pro | $20.00 | $80.00 | 1M | Reasoning |
| o1 | $15.00 | $60.00 | 200K | Reasoning |
| GPT-4o | $2.50 | $10.00 | 128K | Flagship |
| GPT-5.2 | $1.75 | $14.00 | 1M | Flagship |
| o3 | $2.00 | $8.00 | 1M | Reasoning |
| GPT-5 / 5.1 | $1.25 | $10.00 | 1M | Flagship |
| GPT-4.1 | $2.00 | $8.00 | 200K | Flagship |
| o4-mini | $1.10 | $4.40 | 2M | Reasoning |
| GPT-4.1 mini | $0.40 | $1.60 | 200K | Efficient |
| GPT-5 mini | $0.25 | $2.00 | 500K | Efficient |
| GPT-4o mini | $0.15 | $0.60 | 128K | Efficient |
| GPT-4.1 nano | $0.10 | $0.40 | 128K | Efficient |
| GPT-5 nano | $0.05 | $0.40 | 128K | Efficient |
Best value: GPT-5 at $1.25/$10.00 is the sweet spot for production workloads. GPT-5 nano at $0.05/$0.40 is unbeatable for simple classification and extraction tasks.
Watch out for: Reasoning models (o3, o3-pro) generate hidden thinking tokens billed as output, which can inflate costs 5–14× beyond what the sticker price suggests.
Anthropic
Anthropic's Claude family has a clean three-tier structure: Opus for maximum intelligence, Sonnet for balanced performance, and Haiku for speed and efficiency.
| Model | Input/1M | Output/1M | Context | Category |
|---|---|---|---|---|
| Claude 3 Opus | $15.00 | $75.00 | 200K | Legacy |
| Claude Opus 4.6 | $5.00 | $25.00 | 200K | Flagship |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 1M | Balanced |
| Claude Sonnet 4.5 | $3.00 | $15.00 | 200K | Balanced |
| Claude 3.5 Sonnet | $3.00 | $15.00 | 200K | Balanced |
| Claude Haiku 4.5 | $1.00 | $5.00 | 200K | Efficient |
| Claude 3.5 Haiku | $0.80 | $4.00 | 200K | Efficient |
Best value: Claude Sonnet 4.6 at $3.00/$15.00 with a 1M context window and computer-use capability is Anthropic's strongest all-rounder. Claude 3.5 Haiku at $0.80/$4.00 is the budget pick.
Note: Anthropic's output multiplier is consistently 5× across the Sonnet and Opus lines, making it particularly expensive for output-heavy workloads compared to providers like DeepSeek (1.5× multiplier).
Google's Gemini models stand out for massive context windows (up to 2M tokens) and competitive pricing, especially in the Flash tier.
| Model | Input/1M | Output/1M | Context | Category |
|---|---|---|---|---|
| Gemini 3 Pro | $2.00 | $12.00 | 2M | Flagship |
| Gemini 2.5 Pro | $1.25 | $10.00 | 2M | Flagship |
| Gemini 3 Flash | $0.50 | $3.00 | 1M | Efficient |
| Gemini 2.5 Flash | $0.15 | $0.60 | 1M | Efficient |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 1M | Efficient |
Best value: Gemini 2.5 Flash at $0.15/$0.60 is a standout — multimodal (text, vision, audio, code) with a 1M context window at budget pricing. For long-document workloads, Gemini 2.5 Pro offers 2M context at $1.25/$10.00.
📊 Quick Math: Processing a 500-page document (~375K tokens) in a single Gemini 2.5 Pro call costs $0.47 in input tokens. The same document would require multiple chunked calls on models with 128K context limits, increasing total cost and complexity.
Mistral AI
Mistral is aggressively priced across the board, with particularly strong value at the budget end.
| Model | Input/1M | Output/1M | Context | Category |
|---|---|---|---|---|
| Magistral Medium | $2.00 | $5.00 | 128K | Reasoning |
| Mistral Large 3 | $0.50 | $1.50 | 256K | Flagship |
| Magistral Small | $0.50 | $1.50 | 128K | Reasoning |
| Mistral Medium 3 | $0.40 | $2.00 | 128K | Balanced |
| Devstral 2 | $0.40 | $2.00 | 256K | Code |
| Codestral | $0.30 | $0.90 | 128K | Code |
| Mistral Small 3.2 | $0.06 | $0.18 | 128K | Efficient |
Best value: Mistral Large 3 at $0.50/$1.50 delivers flagship-level reasoning at what most providers charge for efficient-tier models. Mistral Small 3.2 at $0.06/$0.18 is the cheapest model from any major provider. Codestral at $0.30/$0.90 is excellent for dedicated code generation.
DeepSeek
DeepSeek's two models are identically priced and offer remarkable value for code and reasoning workloads.
| Model | Input/1M | Output/1M | Context | Category |
|---|---|---|---|---|
| DeepSeek V3.2 | $0.28 | $0.42 | 128K | Efficient |
| DeepSeek R1 V3.2 | $0.28 | $0.42 | 128K | Reasoning |
Best value: Both models are exceptional. DeepSeek V3.2 rivals mid-tier flagships on coding and reasoning at budget prices. DeepSeek R1 V3.2 is a reasoning model priced like a budget model — it's the cheapest way to get chain-of-thought reasoning. See our DeepSeek vs GPT-5 Mini comparison for a head-to-head analysis.
Meta (via Together AI)
Meta's open-source Llama models are available through inference providers like Together AI. The key advantage: symmetric pricing where input and output cost the same.
| Model | Input/1M | Output/1M | Context | Category |
|---|---|---|---|---|
| Llama 3.1 405B | $3.50 | $3.50 | 128K | Flagship |
| Llama 3.1 70B | $0.88 | $0.88 | 128K | Balanced |
| Llama 4 Maverick | $0.27 | $0.85 | 1M | Flagship |
| Llama 3.1 8B | $0.18 | $0.18 | 128K | Efficient |
Best value: Llama 4 Maverick at $0.27/$0.85 offers flagship multimodal capabilities with a 1M context window at efficient pricing. Llama 3.1 8B at $0.18/$0.18 is ideal for high-volume simple tasks, and the symmetric pricing makes cost estimation trivial.
xAI
xAI's Grok models span premium reasoning to ultra-efficient, with the standout being Grok 4.1 Fast's combination of low cost and massive context.
| Model | Input/1M | Output/1M | Context | Category |
|---|---|---|---|---|
| Grok 4 | $3.00 | $15.00 | 256K | Reasoning |
| Grok 3 | $3.00 | $15.00 | 131K | Flagship |
| Grok 3 Mini | $0.30 | $0.50 | 128K | Efficient |
| Grok 4.1 Fast | $0.20 | $0.50 | 2M | Efficient |
Best value: Grok 4.1 Fast at $0.20/$0.50 with a 2M context window is one of the best deals in the market for long-context reasoning tasks. It rivals DeepSeek on cost while offering 15× the context window.
Cohere
Cohere focuses on enterprise use cases, particularly RAG (retrieval-augmented generation) and tool use.
| Model | Input/1M | Output/1M | Context | Category |
|---|---|---|---|---|
| Command R+ | $2.50 | $10.00 | 128K | Flagship |
| Command R | $0.15 | $0.60 | 128K | Efficient |
Best value: Command R at $0.15/$0.60 is a strong choice for RAG pipelines where retrieval quality matters more than creative generation.
How to compare providers without getting misled
Raw price per million tokens is not enough. Here are the five factors that actually determine your real cost:
1. Compare output pricing first
Output tokens drive the bill for most applications. A chatbot, code generator, or content tool generates far more output than input. GPT-5's input looks cheap at $1.25/M, but its $10.00/M output rate is what you'll feel. Compare your projected output volume against output pricing before looking at input rates.
2. Check context window limits
If your prompt is larger than the model's context window, you need chunking strategies that increase total token consumption. Gemini models with 1–2M context can process entire codebases or document collections in a single call. A 128K model requires multiple calls with overlap, increasing cost by 20–40%.
3. Track real output length
A model that outputs longer responses can cost more even if the per-token rate is lower. Some models are naturally verbose — they'll generate 400 tokens where a more concise model gives you 200. Measure actual output lengths in testing, not just per-token rates.
4. Match tier to task
Use efficient models for routine tasks and route hard cases to premium models. This tiered routing strategy can cut costs by 60–80% compared to using a single model for everything.
5. Account for hidden costs
Per-token pricing doesn't capture retries, failed requests, context waste, or thinking token overhead. Budget an extra 30–50% above your raw calculation. Read our hidden costs guide for the full breakdown.
⚠️ Warning: Don't compare models solely on input pricing. A model with $0.25/M input but $2.00/M output (GPT-5 mini) costs more for output-heavy workloads than a model with $0.28/M input and $0.42/M output (DeepSeek V3.2). Always calculate total cost for your specific input/output ratio.
A practical pricing workflow
If you're choosing a provider for production, follow this process:
If you want a faster starting point before provider-by-provider testing, check our best value AI model rankings across budget, mid-range, and premium tiers.
Step 1: Profile your workload. Estimate average input tokens, output tokens, and daily request volume for each AI feature in your app. Use our token estimation guide for rules of thumb.
Step 2: Pick three candidate models. Choose one from each tier — budget, mid, and premium. For example: DeepSeek V3.2, GPT-5, and Claude Opus 4.6.
Step 3: Calculate monthly costs. Use the AI Cost Calculator to plug in your real numbers. Don't estimate — calculate.
Step 4: Run a quality evaluation. Send 50–100 representative prompts to each candidate. Score the outputs on accuracy, relevance, and format. The cheapest model that meets your quality threshold wins.
Step 5: Plan for growth. Multiply your current volume by 5× and 10×. Does the model still fit your budget at scale? If not, identify the tier where you'd need to switch.
📊 Quick Math: A SaaS with 5,000 daily users making 3 AI requests each (1,000 input + 500 output tokens per request) spends $675/month on GPT-5, $158/month on GPT-5 mini, or $53/month on DeepSeek V3.2. At 50,000 users, those become $6,750, $1,580, and $530 respectively. Model choice at scale is the difference between a rounding error and a significant line item.
Provider comparison by use case
Different providers excel at different tasks. Here's a quick-reference guide:
| Use Case | Best Budget Option | Best Mid-Tier | Best Premium |
|---|---|---|---|
| Chatbot | DeepSeek V3.2 ($0.28/$0.42) | GPT-5 mini ($0.25/$2.00) | Claude Sonnet 4.6 ($3/$15) |
| Code generation | Codestral ($0.30/$0.90) | GPT-5 ($1.25/$10.00) | Claude Opus 4.6 ($5/$25) |
| Long documents | Grok 4.1 Fast ($0.20/$0.50, 2M ctx) | Gemini 2.5 Pro ($1.25/$10, 2M ctx) | Gemini 3 Pro ($2/$12, 2M ctx) |
| RAG pipelines | Command R ($0.15/$0.60) | Mistral Large 3 ($0.50/$1.50) | GPT-5.2 ($1.75/$14) |
| Classification | Mistral Small 3.2 ($0.06/$0.18) | GPT-4.1 nano ($0.10/$0.40) | N/A (overkill) |
| Reasoning | DeepSeek R1 V3.2 ($0.28/$0.42) | o4-mini ($1.10/$4.40) | o3-pro ($20/$80) |
The 2026 pricing landscape
The market has matured significantly. Key trends:
Prices are falling fast. Models that cost $15/M output in 2024 have been replaced by equivalents at $2–5/M. Budget models that barely functioned in 2024 now rival 2024's flagships.
Context windows are expanding. 1M–2M context is now common in mid-tier models. This reduces the need for expensive chunking strategies.
Reasoning models are a new tier. The o-series and DeepSeek R1 add a layer of complexity with thinking tokens. They're powerful but require careful cost management.
Provider diversity is real. OpenAI and Anthropic are no longer the only serious options. Mistral, DeepSeek, Google, and xAI offer competitive or superior value for specific workloads.
If you want a quick, concrete comparison, use the AI Cost Check calculator. You can plug in your real usage and instantly see how each provider's pricing tier impacts your budget.
Frequently asked questions
Which AI API provider is cheapest in 2026?
For pure per-token cost, Mistral (Small 3.2 at $0.06/$0.18) and DeepSeek (V3.2 at $0.28/$0.42) are the cheapest. But "cheapest" depends on your workload. For long-context tasks, Google's Gemini models offer the best value per context-window dollar. For reasoning, DeepSeek R1 at $0.28/$0.42 dramatically undercuts OpenAI's o3 at $2.00/$8.00.
How much does it cost to run a chatbot on AI APIs?
A chatbot handling 50,000 conversations/month with 800 input and 400 output tokens each costs approximately $20/month on DeepSeek V3.2, $50 on GPT-5 mini, $250 on GPT-5, or $420 on Claude Sonnet 4.5. Use our calculator for your exact numbers.
Should I use one AI provider or multiple?
Multiple providers is the recommended approach for production applications. Different providers excel at different tasks, and multi-provider setups protect you from rate limits, outages, and pricing changes. Abstract your AI calls behind a common interface so you can switch providers with minimal code changes.
How do I estimate AI API costs before building?
Follow our step-by-step estimation framework: define your use cases, estimate tokens per request, project request volume at launch/growth/scale, then calculate monthly cost across 2–3 candidate models. Add 30–50% for hidden costs like retries and prompt engineering iterations.
What are thinking tokens and how do they affect pricing?
Thinking tokens are internal chain-of-thought tokens generated by reasoning models (o3, o4-mini, DeepSeek R1). They're billed as output tokens but don't appear in the response. A single request can generate 2,000–20,000 thinking tokens, multiplying your effective cost by 5–14×. See our reasoning model pricing guide for detailed analysis.
