What is the most expensive model in the token pricing list?

GPT-5.2 pro is the most expensive model in the table at $21.00 input and $168.00 output per 1M tokens. The article also shows the output spread from $0.18 to $168.00, a 933x gap. That is why model choice has a larger budget impact than small prompt tweaks.

How much does 1 million API requests cost with common token usage?

Using the article's 500-input and 300-output token example, 1 million requests cost $84 on DeepSeek V3.2. The same workload costs $725 on GPT-5 mini and $10,000 on Claude Opus 4.6. At this volume, provider selection can change annual spend by tens of thousands of dollars.

Why do output tokens usually cost more than input tokens?

Most providers price output at a 2x to 8x premium because generated tokens are more compute-heavy. In this comparison, DeepSeek V3.2 is $0.28 input and $0.42 output, while GPT-5 is $1.25 input and $10.00 output. Output-heavy apps like chat and content generation should optimize output length first.

Published February 19, 2026Updated April 3, 2026

AI Cost Per Million Tokens: Every Model Ranked (March 2026)

Looking up AI API cost per 1M tokens? Compare 47 models ranked by input and output price, with quick picks for cheapest overall, cheapest output, and best value at scale.

pricingrankingcost-optimization

AI Cost Per Million Tokens: Every Model Ranked (March 2026)

Looking for the cost per 1M tokens for a specific model? Start here.

This page ranks 47 AI models from cheapest to most expensive for both input and output, so you can quickly answer common lookups like "cheapest 1M token model" or "best value for 1M requests." If you want provider-by-provider context first, use our complete AI API pricing guide.

Quick answer: 1M-token cost leaders (2026)

Cheapest input: GPT-5 nano at $0.05 per 1M input tokens
Cheapest output: Mistral Small 3.2 and Llama 3.1 8B at $0.18 per 1M output tokens
Best combined budget pick in this ranking: Mistral Small 3.2 at $0.06/$0.18
Biggest pricing gap to watch: output costs span $0.18 to $168 per 1M tokens

Use the ranking tables below to find your model fast, then validate against your real input/output ratio.

[stat] 350× The price difference between the cheapest model (Mistral Small 3.2 at $0.06/M input) and the most expensive (GPT-5.2 pro at $21/M input)

Input token pricing: complete ranking

Input tokens are what you send to the model — your prompt, system instructions, conversation history, and any retrieved context. For RAG applications and long-context workloads, input pricing dominates your bill.

Rank	Model	Provider	Input / 1M tokens
1	GPT-5 nano	OpenAI	$0.05
2	Mistral Small 3.2	Mistral	$0.06
3	Gemini 2.5 Flash-Lite	Google	$0.10
4	GPT-4.1 nano	OpenAI	$0.10
5	Gemini 2.5 Flash	Google	$0.15
6	GPT-4o mini	OpenAI	$0.15
7	Command R	Cohere	$0.15
8	Llama 3.1 8B	Meta/Together	$0.18
9	Grok 4.1 Fast	xAI	$0.20
10	GPT-5 mini	OpenAI	$0.25
11	Llama 4 Maverick	Meta/Together	$0.27
12	DeepSeek V3.2	DeepSeek	$0.28
13	DeepSeek R1 V3.2	DeepSeek	$0.28
14	Codestral	Mistral	$0.30
15	Grok 3 Mini	xAI	$0.30
16	GPT-4.1 mini	OpenAI	$0.40
17	Mistral Medium 3	Mistral	$0.40
18	Devstral 2	Mistral	$0.40
19	Gemini 3 Flash	Google	$0.50
20	Mistral Large 3	Mistral	$0.50
21	Magistral Small	Mistral	$0.50
22	Claude 3.5 Haiku	Anthropic	$0.80
23	Llama 3.1 70B	Meta/Together	$0.88
24	Claude Haiku 4.5	Anthropic	$1.00
25	o4-mini	OpenAI	$1.10
26	o3-mini	OpenAI	$1.10
27	GPT-5	OpenAI	$1.25
28	GPT-5.1	OpenAI	$1.25
29	Gemini 2.5 Pro	Google	$1.25
30	GPT-5.2	OpenAI	$1.75
31	GPT-4.1	OpenAI	$2.00
32	o3	OpenAI	$2.00
33	Gemini 3 Pro	Google	$2.00
34	Magistral Medium	Mistral	$2.00
35	GPT-4o	OpenAI	$2.50
36	Command R+	Cohere	$2.50
37	Claude Sonnet 4.6	Anthropic	$3.00
38	Claude Sonnet 4.5	Anthropic	$3.00
39	Claude 3.5 Sonnet	Anthropic	$3.00
40	Grok 4	xAI	$3.00
41	Grok 3	xAI	$3.00
42	Llama 3.1 405B	Meta/Together	$3.50
43	Claude Opus 4.6	Anthropic	$5.00
44	o1	OpenAI	$15.00
45	Claude 3 Opus	Anthropic	$15.00
46	o3-pro	OpenAI	$20.00
47	GPT-5.2 pro	OpenAI	$21.00

💡 Key Takeaway: The cheapest input pricing comes from nano/lite models under $0.15/M. But input cost is only half the story — output pricing is where most budgets break.

Output token pricing: complete ranking

Output tokens are what the model generates — your responses, completions, and generated code. Output tokens cost 2–8× more than input across most providers, and they're the dominant cost factor for any application that generates substantial text.

Rank	Model	Provider	Output / 1M tokens
1	Mistral Small 3.2	Mistral	$0.18
2	Llama 3.1 8B	Meta/Together	$0.18
3	Gemini 2.5 Flash-Lite	Google	$0.40
4	GPT-5 nano	OpenAI	$0.40
5	GPT-4.1 nano	OpenAI	$0.40
6	DeepSeek V3.2	DeepSeek	$0.42
7	DeepSeek R1 V3.2	DeepSeek	$0.42
8	Grok 4.1 Fast	xAI	$0.50
9	Grok 3 Mini	xAI	$0.50
10	Gemini 2.5 Flash	Google	$0.60
11	GPT-4o mini	OpenAI	$0.60
12	Command R	Cohere	$0.60
13	Llama 4 Maverick	Meta/Together	$0.85
14	Llama 3.1 70B	Meta/Together	$0.88
15	Codestral	Mistral	$0.90
16	Mistral Large 3	Mistral	$1.50
17	Magistral Small	Mistral	$1.50
18	GPT-4.1 mini	OpenAI	$1.60
19	GPT-5 mini	OpenAI	$2.00
20	Mistral Medium 3	Mistral	$2.00
21	Devstral 2	Mistral	$2.00
22	Gemini 3 Flash	Google	$3.00
23	Llama 3.1 405B	Meta/Together	$3.50
24	Claude 3.5 Haiku	Anthropic	$4.00
25	o4-mini	OpenAI	$4.40
26	o3-mini	OpenAI	$4.40
27	Claude Haiku 4.5	Anthropic	$5.00
28	Magistral Medium	Mistral	$5.00
29	GPT-4.1	OpenAI	$8.00
30	o3	OpenAI	$8.00
31	GPT-5	OpenAI	$10.00
32	GPT-5.1	OpenAI	$10.00
33	GPT-4o	OpenAI	$10.00
34	Gemini 2.5 Pro	Google	$10.00
35	Command R+	Cohere	$10.00
36	Gemini 3 Pro	Google	$12.00
37	GPT-5.2	OpenAI	$14.00
38	Claude Sonnet 4.6	Anthropic	$15.00
39	Claude Sonnet 4.5	Anthropic	$15.00
40	Claude 3.5 Sonnet	Anthropic	$15.00
41	Grok 4	xAI	$15.00
42	Grok 3	xAI	$15.00
43	Claude Opus 4.6	Anthropic	$25.00
44	o1	OpenAI	$60.00
45	Claude 3 Opus	Anthropic	$75.00
46	o3-pro	OpenAI	$80.00
47	GPT-5.2 pro	OpenAI	$168.00

$0.18

Mistral Small 3.2 output per 1M

$168.00

GPT-5.2 pro output per 1M

The output pricing gap is staggering. GPT-5.2 pro's output costs 933× more than Mistral Small 3.2. Even among mainstream models, the spread is enormous — DeepSeek V3.2 at $0.42 versus Claude Opus 4.6 at $25.00 is a 60× difference.

The ultra-budget tier (under $0.50/M output)

These models cost pennies per thousand requests. They're ideal for high-volume, lower-complexity tasks: classification, extraction, simple Q&A, content moderation, and data processing.

Best picks in this tier:

Mistral Small 3.2 at $0.06/$0.18 — the cheapest option overall. Strong for text classification and simple generation tasks. 128K context window.
GPT-5 nano at $0.05/$0.40 — OpenAI's cheapest, text-only. Good for simple extraction and formatting. 128K context.
DeepSeek V3.2 at $0.28/$0.42 — punches well above its weight. Supports code and reasoning at budget prices. A standout value.
Grok 4.1 Fast at $0.20/$0.50 — xAI's efficient model with a massive 2M context window. Strong for long-document processing at low cost.
Llama 3.1 8B at $0.18/$0.18 — symmetric pricing and open-source. Perfect for self-hosting scenarios.

📊 Quick Math: Processing 1 million requests with 500 input + 300 output tokens each costs just $84 on DeepSeek V3.2, compared to $4,375 on Claude Opus 4.6. That's a 52× difference for the same volume.

For most teams, the ultra-budget tier handles 60–80% of production workloads. Start here and only escalate to pricier models when quality measurably suffers. Read our guide on model routing strategies for how to implement this.

The efficient tier ($0.50–$2.00/M output)

These models offer a strong quality-to-cost ratio. They handle summarization, code generation, multi-step reasoning, and customer-facing chat without breaking the bank.

Best picks:

Gemini 2.5 Flash at $0.15/$0.60 — excellent for multimodal tasks (text, vision, audio) at efficient pricing. 1M context window.
GPT-4o mini at $0.15/$0.60 — still a solid workhorse for vision and text tasks.
Llama 4 Maverick at $0.27/$0.85 — Meta's latest open-source flagship. Strong multimodal capabilities with 1M context.
Codestral at $0.30/$0.90 — Mistral's purpose-built coding model. If you need code generation on a budget, this is the pick.
Mistral Large 3 at $0.50/$1.50 — flagship-quality reasoning at efficient-tier pricing. One of the best values in the market.
GPT-4.1 mini at $0.40/$1.60 — fine-tunable with vision support. Good for custom workflows.
GPT-5 mini at $0.25/$2.00 — OpenAI's budget workhorse with a huge 500K context window.

⚠️ Warning: Don't assume "mini" models are low quality. GPT-5 mini and Gemini 2.5 Flash regularly outperform previous-generation flagship models on standard benchmarks. Always test on your actual workload before paying for premium.

The mid-tier ($2.00–$15.00/M output)

Best for complex tasks requiring strong reasoning: code generation, analysis, creative writing, and customer-facing applications where quality directly affects user retention.

Key models:

Claude 3.5 Haiku at $0.80/$4.00 — fast and affordable Anthropic option with vision support.
Claude Haiku 4.5 at $1.00/$5.00 — near-frontier intelligence at efficient pricing.
o4-mini at $1.10/$4.40 — OpenAI's efficient reasoning model with a 2M context window. Strong for tasks requiring structured thinking; for a dedicated breakdown of this category, see our reasoning models cost comparison.
GPT-5 / GPT-5.1 at $1.25/$10.00 — the sweet spot for most production workloads. 1M context, strong across all task types.
GPT-5.2 at $1.75/$14.00 — OpenAI's latest flagship with vision, audio, and reasoning. The go-to for agentic applications.
Gemini 3 Pro at $2.00/$12.00 — 2M context window, multimodal including video. Best for processing very long documents.
Claude Sonnet 4.6 at $3.00/$15.00 — 1M context, computer use capability. Top-tier coding and reasoning.
Grok 4 at $3.00/$15.00 — xAI's premium reasoning model with vision support.

For most production applications, GPT-5 at $1.25/$10.00 or Gemini 2.5 Pro at $1.25/$10.00 offer the best balance of capability and cost. They're 5× cheaper than Claude Opus 4.6 while handling the vast majority of tasks competently.

The premium tier ($15.00+/M output)

For tasks where quality is paramount and cost is secondary: legal analysis, medical reasoning, research synthesis, complex code architecture, and high-stakes decision support.

Claude Opus 4.6 at $5.00/$25.00 — Anthropic's most intelligent model. Best for building agents and complex coding tasks.
o1 at $15.00/$60.00 — original reasoning model, still strong for complex problem-solving.
Claude 3 Opus at $15.00/$75.00 — legacy but still available. No reason to use over Opus 4.6.
o3-pro at $20.00/$80.00 — premium reasoning for the most demanding tasks. Use sparingly.
GPT-5.2 pro at $21.00/$168.00 — OpenAI's most capable and most expensive model. Reserve for cases where nothing else will do.

⚠️ Warning: Premium models cost 50–100× more than budget alternatives. Before committing, run a quality comparison on your specific task. Many teams discover that a $0.50/M model performs within 5% of a $25/M model for their use case. Use our calculator to see the cost impact before deciding.

Combined cost ranking: the metric that matters

Raw input or output pricing alone is misleading. What matters is the combined cost per request for your specific workload. Here's a comparison assuming a typical request of 500 input tokens and 300 output tokens:

Model	Input Cost	Output Cost	Total per Request
Mistral Small 3.2	$0.000030	$0.000054	$0.000084
GPT-5 nano	$0.000025	$0.000120	$0.000145
DeepSeek V3.2	$0.000140	$0.000126	$0.000266
Grok 4.1 Fast	$0.000100	$0.000150	$0.000250
GPT-5 mini	$0.000125	$0.000600	$0.000725
Gemini 3 Flash	$0.000250	$0.000900	$0.001150
GPT-5	$0.000625	$0.003000	$0.003625
Claude Sonnet 4.6	$0.001500	$0.004500	$0.006000
Claude Opus 4.6	$0.002500	$0.007500	$0.010000
o3-pro	$0.010000	$0.024000	$0.034000

✅ TL;DR: For standard workloads, DeepSeek V3.2, Mistral Small 3.2, and GPT-5 nano are the cost leaders. GPT-5 and Gemini 2.5 Pro offer the best value in the mid-tier. Premium models cost 40–400× more per request — use them only when measurably better quality justifies the spend. For a cross-tier shortlist, see our price-to-performance rankings.

How output-heavy workloads shift the rankings

The rankings above assume balanced input/output. But many real applications skew heavily toward output — chatbots, content generation, and code assistants generate far more tokens than they receive. For output-heavy workloads (say 200 input, 1,000 output tokens), the rankings shift:

Model	Cost per Request (output-heavy)
Mistral Small 3.2	$0.000192
Llama 3.1 8B	$0.000216
DeepSeek V3.2	$0.000476
Grok 4.1 Fast	$0.000540
GPT-5 mini	$0.002050
GPT-5	$0.010250
Claude Opus 4.6	$0.026000

DeepSeek V3.2 remains dominant for output-heavy work because its output pricing ($0.42/M) is among the lowest in the market — while still delivering strong reasoning and coding capabilities. Compare that to GPT-5 mini's $2.00/M output — DeepSeek is nearly 5× cheaper on output. See our DeepSeek vs GPT-5 Mini deep dive for a full comparison, or our cost-per-word breakdown if your team budgets by content volume instead of tokens.

How to use this data

Start with the cheapest model that meets your quality threshold. Test with 50–100 representative prompts before committing.
Focus on output pricing if your application generates substantial text. Output costs dominate most bills.
Use our calculator to estimate monthly costs for your exact usage pattern — input tokens, output tokens, and request volume.
Consider the batch calculator to compare multiple models simultaneously across different workloads.
Implement model routing — send simple tasks to budget models and reserve expensive models for complex work. This alone can cut your bill by 60–80%.
Account for hidden costs like retries, context waste, and thinking tokens. Read our hidden costs guide to budget accurately, then layer in prompt caching savings to reduce repeated input costs.

Prices change frequently. We update our data when providers change rates — check the pricing table for the latest numbers, or use the calculator to run your own comparison.

Frequently asked questions

What is the cheapest AI model per million tokens in 2026?

For input tokens, GPT-5 nano at $0.05/M is the cheapest. For output tokens, Mistral Small 3.2 at $0.18/M and Llama 3.1 8B at $0.18/M are tied for the lowest. For the best combined value on a standard workload, Mistral Small 3.2 at $0.06/$0.18 edges out the competition.

Why do output tokens cost more than input tokens?

Output tokens require sequential generation — the model must predict each token one at a time, which is more computationally expensive than processing input tokens in parallel. Most providers charge 2–8× more for output, though DeepSeek bucks this trend with only a 1.5× multiplier ($0.28 input vs $0.42 output).

How much does it cost to process 1 million API requests?

It depends entirely on your token counts and model choice. For a typical request (500 input, 300 output tokens), 1 million requests would cost $84 on DeepSeek V3.2, $725 on GPT-5 mini, or $10,000 on Claude Opus 4.6. Use our calculator to get exact numbers for your workload.

Which AI provider has the best pricing overall?

No single provider wins across all tiers. DeepSeek offers the best value for code and reasoning at budget prices. Mistral has the cheapest lightweight models. Google excels at long-context workloads with competitive Flash pricing. OpenAI has the broadest range from $0.05 to $168 per million tokens. The best provider depends on your specific use case — compare them side by side on our pricing page.

How often do AI API prices change?

Major providers adjust pricing every 2–4 months, typically downward. New model releases often reset pricing tiers. We track changes and update our data regularly. The trend is clear: prices drop 30–50% year over year for equivalent capability, making it worth re-evaluating your model choice quarterly.

Related Cost Guides

Keep going with the closest pricing and optimization guides in this cluster.

AI Cost Per Million Tokens: Every Model Ranked (March 2026)

Quick answer: 1M-token cost leaders (2026)

Input token pricing: complete ranking

Output token pricing: complete ranking

The ultra-budget tier (under $0.50/M output)

The efficient tier ($0.50–$2.00/M output)

The mid-tier ($2.00–$15.00/M output)

The premium tier ($15.00+/M output)

Combined cost ranking: the metric that matters

How output-heavy workloads shift the rankings

How to use this data

Frequently asked questions

What is the cheapest AI model per million tokens in 2026?

Why do output tokens cost more than input tokens?

How much does it cost to process 1 million API requests?

Which AI provider has the best pricing overall?

How often do AI API prices change?

Related Cost Guides

Cheapest AI Model for Every Task: April 2026 Buyer's Guide

How Many AI Tokens Can You Get for $1? Every Major Model Compared

AI Reasoning Model Pricing: What Thinking Tokens Actually Cost You