Skip to main content
February 19, 2026

AI Cost Per Million Tokens: Every Model Ranked (February 2026)

A complete ranking of every major AI model by cost per million tokens — input and output prices compared side by side for budget planning.

pricingrankingcost-optimization
AI Cost Per Million Tokens: Every Model Ranked (February 2026)

Token pricing is the single most important factor in AI API budgeting. Whether you're building a chatbot, a code generation tool, or a document analysis pipeline, the cost per million tokens determines whether your project is profitable or hemorrhaging money. Here's every major model ranked from cheapest to most expensive, updated for February 2026, with analysis on where each tier makes sense.

[stat] 350× The price difference between the cheapest model (Mistral Small 3.2 at $0.06/M input) and the most expensive (GPT-5.2 pro at $21/M input)

Input token pricing: complete ranking

Input tokens are what you send to the model — your prompt, system instructions, conversation history, and any retrieved context. For RAG applications and long-context workloads, input pricing dominates your bill.

Rank Model Provider Input / 1M tokens
1 GPT-5 nano OpenAI $0.05
2 Mistral Small 3.2 Mistral $0.06
3 Gemini 2.5 Flash-Lite Google $0.10
4 GPT-4.1 nano OpenAI $0.10
5 Gemini 2.5 Flash Google $0.15
6 GPT-4o mini OpenAI $0.15
7 Command R Cohere $0.15
8 Llama 3.1 8B Meta/Together $0.18
9 Grok 4.1 Fast xAI $0.20
10 GPT-5 mini OpenAI $0.25
11 Llama 4 Maverick Meta/Together $0.27
12 DeepSeek V3.2 DeepSeek $0.28
13 DeepSeek R1 V3.2 DeepSeek $0.28
14 Codestral Mistral $0.30
15 Grok 3 Mini xAI $0.30
16 GPT-4.1 mini OpenAI $0.40
17 Mistral Medium 3 Mistral $0.40
18 Devstral 2 Mistral $0.40
19 Gemini 3 Flash Google $0.50
20 Mistral Large 3 Mistral $0.50
21 Magistral Small Mistral $0.50
22 Claude 3.5 Haiku Anthropic $0.80
23 Llama 3.1 70B Meta/Together $0.88
24 Claude Haiku 4.5 Anthropic $1.00
25 o4-mini OpenAI $1.10
26 o3-mini OpenAI $1.10
27 GPT-5 OpenAI $1.25
28 GPT-5.1 OpenAI $1.25
29 Gemini 2.5 Pro Google $1.25
30 GPT-5.2 OpenAI $1.75
31 GPT-4.1 OpenAI $2.00
32 o3 OpenAI $2.00
33 Gemini 3 Pro Google $2.00
34 Magistral Medium Mistral $2.00
35 GPT-4o OpenAI $2.50
36 Command R+ Cohere $2.50
37 Claude Sonnet 4.6 Anthropic $3.00
38 Claude Sonnet 4.5 Anthropic $3.00
39 Claude 3.5 Sonnet Anthropic $3.00
40 Grok 4 xAI $3.00
41 Grok 3 xAI $3.00
42 Llama 3.1 405B Meta/Together $3.50
43 Claude Opus 4.6 Anthropic $5.00
44 o1 OpenAI $15.00
45 Claude 3 Opus Anthropic $15.00
46 o3-pro OpenAI $20.00
47 GPT-5.2 pro OpenAI $21.00

💡 Key Takeaway: The cheapest input pricing comes from nano/lite models under $0.15/M. But input cost is only half the story — output pricing is where most budgets break.


Output token pricing: complete ranking

Output tokens are what the model generates — your responses, completions, and generated code. Output tokens cost 2–8× more than input across most providers, and they're the dominant cost factor for any application that generates substantial text.

Rank Model Provider Output / 1M tokens
1 Mistral Small 3.2 Mistral $0.18
2 Llama 3.1 8B Meta/Together $0.18
3 Gemini 2.5 Flash-Lite Google $0.40
4 GPT-5 nano OpenAI $0.40
5 GPT-4.1 nano OpenAI $0.40
6 DeepSeek V3.2 DeepSeek $0.42
7 DeepSeek R1 V3.2 DeepSeek $0.42
8 Grok 4.1 Fast xAI $0.50
9 Grok 3 Mini xAI $0.50
10 Gemini 2.5 Flash Google $0.60
11 GPT-4o mini OpenAI $0.60
12 Command R Cohere $0.60
13 Llama 4 Maverick Meta/Together $0.85
14 Llama 3.1 70B Meta/Together $0.88
15 Codestral Mistral $0.90
16 Mistral Large 3 Mistral $1.50
17 Magistral Small Mistral $1.50
18 GPT-4.1 mini OpenAI $1.60
19 GPT-5 mini OpenAI $2.00
20 Mistral Medium 3 Mistral $2.00
21 Devstral 2 Mistral $2.00
22 Gemini 3 Flash Google $3.00
23 Llama 3.1 405B Meta/Together $3.50
24 Claude 3.5 Haiku Anthropic $4.00
25 o4-mini OpenAI $4.40
26 o3-mini OpenAI $4.40
27 Claude Haiku 4.5 Anthropic $5.00
28 Magistral Medium Mistral $5.00
29 GPT-4.1 OpenAI $8.00
30 o3 OpenAI $8.00
31 GPT-5 OpenAI $10.00
32 GPT-5.1 OpenAI $10.00
33 GPT-4o OpenAI $10.00
34 Gemini 2.5 Pro Google $10.00
35 Command R+ Cohere $10.00
36 Gemini 3 Pro Google $12.00
37 GPT-5.2 OpenAI $14.00
38 Claude Sonnet 4.6 Anthropic $15.00
39 Claude Sonnet 4.5 Anthropic $15.00
40 Claude 3.5 Sonnet Anthropic $15.00
41 Grok 4 xAI $15.00
42 Grok 3 xAI $15.00
43 Claude Opus 4.6 Anthropic $25.00
44 o1 OpenAI $60.00
45 Claude 3 Opus Anthropic $75.00
46 o3-pro OpenAI $80.00
47 GPT-5.2 pro OpenAI $168.00
$0.18
Mistral Small 3.2 output per 1M
vs
$168.00
GPT-5.2 pro output per 1M

The output pricing gap is staggering. GPT-5.2 pro's output costs 933× more than Mistral Small 3.2. Even among mainstream models, the spread is enormous — DeepSeek V3.2 at $0.42 versus Claude Opus 4.6 at $25.00 is a 60× difference.


The ultra-budget tier (under $0.50/M output)

These models cost pennies per thousand requests. They're ideal for high-volume, lower-complexity tasks: classification, extraction, simple Q&A, content moderation, and data processing.

Best picks in this tier:

  • Mistral Small 3.2 at $0.06/$0.18 — the cheapest option overall. Strong for text classification and simple generation tasks. 128K context window.
  • GPT-5 nano at $0.05/$0.40 — OpenAI's cheapest, text-only. Good for simple extraction and formatting. 128K context.
  • DeepSeek V3.2 at $0.28/$0.42 — punches well above its weight. Supports code and reasoning at budget prices. A standout value.
  • Grok 4.1 Fast at $0.20/$0.50 — xAI's efficient model with a massive 2M context window. Strong for long-document processing at low cost.
  • Llama 3.1 8B at $0.18/$0.18 — symmetric pricing and open-source. Perfect for self-hosting scenarios.

📊 Quick Math: Processing 1 million requests with 500 input + 300 output tokens each costs just $84 on DeepSeek V3.2, compared to $4,375 on Claude Opus 4.6. That's a 52× difference for the same volume.

For most teams, the ultra-budget tier handles 60–80% of production workloads. Start here and only escalate to pricier models when quality measurably suffers. Read our guide on model routing strategies for how to implement this.

The efficient tier ($0.50–$2.00/M output)

These models offer a strong quality-to-cost ratio. They handle summarization, code generation, multi-step reasoning, and customer-facing chat without breaking the bank.

Best picks:

  • Gemini 2.5 Flash at $0.15/$0.60 — excellent for multimodal tasks (text, vision, audio) at efficient pricing. 1M context window.
  • GPT-4o mini at $0.15/$0.60 — still a solid workhorse for vision and text tasks.
  • Llama 4 Maverick at $0.27/$0.85 — Meta's latest open-source flagship. Strong multimodal capabilities with 1M context.
  • Codestral at $0.30/$0.90 — Mistral's purpose-built coding model. If you need code generation on a budget, this is the pick.
  • Mistral Large 3 at $0.50/$1.50 — flagship-quality reasoning at efficient-tier pricing. One of the best values in the market.
  • GPT-4.1 mini at $0.40/$1.60 — fine-tunable with vision support. Good for custom workflows.
  • GPT-5 mini at $0.25/$2.00 — OpenAI's budget workhorse with a huge 500K context window.

⚠️ Warning: Don't assume "mini" models are low quality. GPT-5 mini and Gemini 2.5 Flash regularly outperform previous-generation flagship models on standard benchmarks. Always test on your actual workload before paying for premium.


The mid-tier ($2.00–$15.00/M output)

Best for complex tasks requiring strong reasoning: code generation, analysis, creative writing, and customer-facing applications where quality directly affects user retention.

Key models:

  • Claude 3.5 Haiku at $0.80/$4.00 — fast and affordable Anthropic option with vision support.
  • Claude Haiku 4.5 at $1.00/$5.00 — near-frontier intelligence at efficient pricing.
  • o4-mini at $1.10/$4.40 — OpenAI's efficient reasoning model with a 2M context window. Strong for tasks requiring structured thinking.
  • GPT-5 / GPT-5.1 at $1.25/$10.00 — the sweet spot for most production workloads. 1M context, strong across all task types.
  • GPT-5.2 at $1.75/$14.00 — OpenAI's latest flagship with vision, audio, and reasoning. The go-to for agentic applications.
  • Gemini 3 Pro at $2.00/$12.00 — 2M context window, multimodal including video. Best for processing very long documents.
  • Claude Sonnet 4.6 at $3.00/$15.00 — 1M context, computer use capability. Top-tier coding and reasoning.
  • Grok 4 at $3.00/$15.00 — xAI's premium reasoning model with vision support.

For most production applications, GPT-5 at $1.25/$10.00 or Gemini 2.5 Pro at $1.25/$10.00 offer the best balance of capability and cost. They're 5× cheaper than Claude Opus 4.6 while handling the vast majority of tasks competently.

The premium tier ($15.00+/M output)

For tasks where quality is paramount and cost is secondary: legal analysis, medical reasoning, research synthesis, complex code architecture, and high-stakes decision support.

  • Claude Opus 4.6 at $5.00/$25.00 — Anthropic's most intelligent model. Best for building agents and complex coding tasks.
  • o1 at $15.00/$60.00 — original reasoning model, still strong for complex problem-solving.
  • Claude 3 Opus at $15.00/$75.00 — legacy but still available. No reason to use over Opus 4.6.
  • o3-pro at $20.00/$80.00 — premium reasoning for the most demanding tasks. Use sparingly.
  • GPT-5.2 pro at $21.00/$168.00 — OpenAI's most capable and most expensive model. Reserve for cases where nothing else will do.

⚠️ Warning: Premium models cost 50–100× more than budget alternatives. Before committing, run a quality comparison on your specific task. Many teams discover that a $0.50/M model performs within 5% of a $25/M model for their use case. Use our calculator to see the cost impact before deciding.


Combined cost ranking: the metric that matters

Raw input or output pricing alone is misleading. What matters is the combined cost per request for your specific workload. Here's a comparison assuming a typical request of 500 input tokens and 300 output tokens:

Model Input Cost Output Cost Total per Request
Mistral Small 3.2 $0.000030 $0.000054 $0.000084
GPT-5 nano $0.000025 $0.000120 $0.000145
DeepSeek V3.2 $0.000140 $0.000126 $0.000266
Grok 4.1 Fast $0.000100 $0.000150 $0.000250
GPT-5 mini $0.000125 $0.000600 $0.000725
Gemini 3 Flash $0.000250 $0.000900 $0.001150
GPT-5 $0.000625 $0.003000 $0.003625
Claude Sonnet 4.6 $0.001500 $0.004500 $0.006000
Claude Opus 4.6 $0.002500 $0.007500 $0.010000
o3-pro $0.010000 $0.024000 $0.034000

✅ TL;DR: For standard workloads, DeepSeek V3.2, Mistral Small 3.2, and GPT-5 nano are the cost leaders. GPT-5 and Gemini 2.5 Pro offer the best value in the mid-tier. Premium models cost 40–400× more per request — use them only when measurably better quality justifies the spend.


How output-heavy workloads shift the rankings

The rankings above assume balanced input/output. But many real applications skew heavily toward output — chatbots, content generation, and code assistants generate far more tokens than they receive. For output-heavy workloads (say 200 input, 1,000 output tokens), the rankings shift:

Model Cost per Request (output-heavy)
Mistral Small 3.2 $0.000192
Llama 3.1 8B $0.000216
DeepSeek V3.2 $0.000476
Grok 4.1 Fast $0.000540
GPT-5 mini $0.002050
GPT-5 $0.010250
Claude Opus 4.6 $0.026000

DeepSeek V3.2 remains dominant for output-heavy work because its output pricing ($0.42/M) is among the lowest in the market — while still delivering strong reasoning and coding capabilities. Compare that to GPT-5 mini's $2.00/M output — DeepSeek is nearly 5× cheaper on output. See our DeepSeek vs GPT-5 Mini deep dive for a full comparison.

How to use this data

  1. Start with the cheapest model that meets your quality threshold. Test with 50–100 representative prompts before committing.
  2. Focus on output pricing if your application generates substantial text. Output costs dominate most bills.
  3. Use our calculator to estimate monthly costs for your exact usage pattern — input tokens, output tokens, and request volume.
  4. Consider the batch calculator to compare multiple models simultaneously across different workloads.
  5. Implement model routing — send simple tasks to budget models and reserve expensive models for complex work. This alone can cut your bill by 60–80%.
  6. Account for hidden costs like retries, context waste, and thinking tokens. Read our hidden costs guide to budget accurately.

Prices change frequently. We update our data when providers change rates — check the pricing table for the latest numbers, or use the calculator to run your own comparison.


Frequently asked questions

What is the cheapest AI model per million tokens in 2026?

For input tokens, GPT-5 nano at $0.05/M is the cheapest. For output tokens, Mistral Small 3.2 at $0.18/M and Llama 3.1 8B at $0.18/M are tied for the lowest. For the best combined value on a standard workload, Mistral Small 3.2 at $0.06/$0.18 edges out the competition.

Why do output tokens cost more than input tokens?

Output tokens require sequential generation — the model must predict each token one at a time, which is more computationally expensive than processing input tokens in parallel. Most providers charge 2–8× more for output, though DeepSeek bucks this trend with only a 1.5× multiplier ($0.28 input vs $0.42 output).

How much does it cost to process 1 million API requests?

It depends entirely on your token counts and model choice. For a typical request (500 input, 300 output tokens), 1 million requests would cost $84 on DeepSeek V3.2, $725 on GPT-5 mini, or $10,000 on Claude Opus 4.6. Use our calculator to get exact numbers for your workload.

Which AI provider has the best pricing overall?

No single provider wins across all tiers. DeepSeek offers the best value for code and reasoning at budget prices. Mistral has the cheapest lightweight models. Google excels at long-context workloads with competitive Flash pricing. OpenAI has the broadest range from $0.05 to $168 per million tokens. The best provider depends on your specific use case — compare them side by side on our pricing page.

How often do AI API prices change?

Major providers adjust pricing every 2–4 months, typically downward. New model releases often reset pricing tiers. We track changes and update our data regularly. The trend is clear: prices drop 30–50% year over year for equivalent capability, making it worth re-evaluating your model choice quarterly.