Skip to main content

Cheapest AI APIs in 2026: 85 Models Ranked by Price

Updated May 2026 with 85 models across 8 providers. GPT-5 nano is the cheapest AI API by input price, Ministral 3 3B is the cheapest balanced option, and Gemini 2.0 Flash-Lite plus Llama 4 Scout lead the cheap long-context shortlist.

pricingcost comparisonbudgetapi
Cheapest AI APIs in 2026: 85 Models Ranked by Price

If you searched for the cheapest AI API in 2026, here's the direct answer: GPT-5 nano is the cheapest by input price, but Ministral 3 3B is the cheapest balanced model once output matters. If you need cheap long-context, Gemini 2.0 Flash-Lite and Llama 4 Scout are the budget picks worth watching.

We refreshed this guide against the latest site pricing dataset for May 2026: 85 models across 8 providers. The cheapest input price we track is still $0.05 per million tokens. The most expensive output price in the dataset is $600 per million. That's a wild 12,000× spread.

[stat] 12,000× The gap between GPT-5 nano at $0.05/M input and o1 Pro at $600/M output in the current pricing dataset


Fast Answer: Which Cheap Model Should You Start With?

If you don't want the full breakdown, start here:

  • Cheapest raw input: GPT-5 nano — $0.05 input / $0.40 output
  • Cheapest balanced workload: Ministral 3 3B — $0.10 input / $0.10 output
  • Cheapest long-context default: Gemini 2.0 Flash-Lite — $0.075 input / $0.30 output with 1M context
  • Cheapest huge-context bargain: Llama 4 Scout — $0.08 input / $0.30 output with 10M context
  • Best cheap step-up: DeepSeek V4 Flash — $0.14 input / $0.28 output

That is the actual cheap-model shortlist right now. Everything else is nuance.


The 10 Cheapest AI APIs Right Now

Here are the lowest-cost models in the current dataset, ranked by input token price:

Rank Model Provider Input $/1M Output $/1M Context
1 GPT-5 nano OpenAI $0.05 $0.40 128K
2 Gemini 2.0 Flash-Lite Google $0.075 $0.30 1M
3 Llama 4 Scout Meta (via Together AI) $0.08 $0.30 10M
4 Ministral 3 3B Mistral AI $0.10 $0.10 256K
5 Mistral Small 3.2 Mistral AI $0.10 $0.30 128K
6 GPT-4.1 nano OpenAI $0.10 $0.40 128K
7 Gemini 2.0 Flash Google $0.10 $0.40 1M
8 Gemini 2.5 Flash-Lite Google $0.10 $0.40 1M
9 DeepSeek V4 Flash DeepSeek $0.14 $0.28 1M
10 Ministral 3 8B Mistral AI $0.15 $0.15 256K

The headline is simple: GPT-5 nano still wins the sticker-price race, but Ministral 3 3B now owns the cheapest total cost on a mixed input/output workload. Gemini 2.0 Flash-Lite and Llama 4 Scout are the most interesting cheap long-context options because they stay cheap while giving you 1M to 10M context.

💡 Key Takeaway: The cheapest input model and the cheapest real workload model are not always the same thing. If your app generates a lot of text, output pricing matters more than the headline input number.


Best Value by Category

Price isn't everything. Here are the strongest budget picks depending on what actually drives your bill.

Cheapest Input Floor: GPT-5 nano ($0.05/$0.40)

If your workload is prompt-heavy and output-light — routing, classification, short extraction, lightweight tool decisions — GPT-5 nano is still the easiest way to minimize input spend. Nothing else in the table beats $0.05/M input.

The catch is output. At $0.40/M output, GPT-5 nano is not the cheapest once responses get longer. It wins when your app sends lots of tokens in and asks for short answers back.

Cheapest Balanced Budget Model: Ministral 3 3B ($0.10/$0.10)

For pure price efficiency, Ministral 3 3B is the cleanest answer on the page. It pairs $0.10 input with $0.10 output, which makes it the cheapest model in the dataset on a simple combined basis.

That matters a lot for output-heavy workloads like templated generation, low-cost chat, summarization, and structured extraction. It is not the model you'd pick for high-stakes reasoning, but if your first question is "what is the cheapest model I can actually ship with?", this is the strongest answer right now.

Cheapest Long-Context Default: Gemini 2.0 Flash-Lite ($0.075/$0.30)

Gemini 2.0 Flash-Lite is the practical budget default for teams that need room. At $0.075 input / $0.30 output with a 1M token context window, it stays near the absolute price floor without feeling cramped.

If you're processing long documents, large threads, or heavy RAG prompts, this is the model I would look at before anything else in the budget tier.

Cheapest Huge-Context Bargain: Llama 4 Scout ($0.08/$0.30)

Llama 4 Scout is weird in a good way. At $0.08/$0.30 with a 10M token context window, it gives you budget pricing with absurd context headroom.

That makes it attractive for document-heavy or codebase-heavy workloads where context length usually forces you into more expensive tiers.

Best Cheap Step-Up Model: DeepSeek V4 Flash ($0.14/$0.28)

DeepSeek V4 Flash is what you choose when nano-tier prices are great but the weakest models feel too cramped. At $0.14 input and $0.28 output, it stays cheap while giving you a stronger cost-quality step up for coding, reasoning, and general-purpose API work.

It is not the absolute cheapest model on the page. It is the model that starts to feel like a serious default once you care about quality but still hate paying flagship rates.

$0.12
Ministral 3 3B total for 1M input + 200K output
vs
$54.60
GPT-5.2 pro total for the same workload

The Real Cost: Input vs Output

Don't obsess over input price alone. Most production apps generate output, and output is where many providers sneak the real bill in.

Example: Processing 1M input tokens and generating 200K output tokens:

Model Input Cost Output Cost Total
Ministral 3 3B $0.10 $0.02 $0.12
GPT-5 nano $0.05 $0.08 $0.13
Gemini 2.0 Flash-Lite $0.075 $0.06 $0.135
Llama 4 Scout $0.08 $0.06 $0.14
Mistral Small 3.2 $0.10 $0.06 $0.16
DeepSeek V4 Flash $0.14 $0.056 $0.196
Claude 3.5 Haiku $0.80 $0.80 $1.60
GPT-5.2 pro $21.00 $33.60 $54.60

This is why the cheap-input headline can fool you. GPT-5 nano has the lowest input rate, but Ministral 3 3B is cheaper on this mixed workload because its output price is so low.

📊 Quick Math: On this workload, $0.12 vs $54.60 is a 455× gap. That's the difference between a cheap background automation and a model quietly chewing through your margins.


Cheapest by Provider

Most major providers still have a real budget option. xAI is the awkward exception after its May 2026 pricing reset.

Provider Cheapest Model Input $/1M Output $/1M
OpenAI GPT-5 nano $0.05 $0.40
Google Gemini 2.0 Flash-Lite $0.075 $0.30
Meta (via Together AI) Llama 4 Scout $0.08 $0.30
Mistral AI Ministral 3 3B $0.10 $0.10
DeepSeek DeepSeek V4 Flash $0.14 $0.28
Cohere Command R $0.15 $0.60
xAI Grok 4.3 / 4.20 (live baseline) $1.25 $2.50
Anthropic Claude 3.5 Haiku $0.80 $4.00

The xAI row is the big freshness correction here. The old Grok 4.1 Fast bargain numbers were a real story for a while, but they are not the live planning baseline anymore. If you are comparing the cheapest live AI APIs, xAI is no longer in the same tier as OpenAI nano, Gemini Flash-Lite, Llama 4 Scout, or Ministral 3 3B. For the full breakdown, read the live xAI Grok pricing guide.

Anthropic is still the expensive outlier on the budget end. Their cheapest option is far above the price floor set by OpenAI, Google, Meta, and Mistral. That does not automatically make Claude bad — it just means Claude is a quality-first buy, not a bargain-bin buy.

⚠️ Warning: Cheap sticker pricing does not include hidden costs like retries, slow inference, oversized prompts, or reasoning-token overhead. Read our hidden costs guide before locking yourself into the cheapest row in a table.


Real-World Cost at Scale

Abstract token math is useful, but monthly workloads make the differences clearer.

High-Volume Chatbot (50K conversations/day)

Assume 800 input tokens and 400 output tokens per conversation. Monthly total: 1.2B input and 600M output tokens.

Model Monthly Cost
Ministral 3 3B $180
Gemini 2.0 Flash-Lite $270
Llama 4 Scout $276
GPT-5 nano $300
DeepSeek V4 Flash $336
Claude 3.5 Haiku $3,360

The cheap-model race is no longer just "OpenAI nano vs everyone else." On this kind of workload, Ministral 3 3B is the cheapest model in the current dataset, while Gemini 2.0 Flash-Lite, Llama 4 Scout, and GPT-5 nano stay very viable depending on whether input or output dominates.

Document Processing Pipeline (10K documents/day)

Assume 4,000 input tokens and 500 output tokens per document. Monthly total: 1.2B input and 150M output tokens.

Model Monthly Cost
GPT-5 nano $120
Gemini 2.0 Flash-Lite $135
Ministral 3 3B $135
Llama 4 Scout $141
DeepSeek V4 Flash $210
Claude 3.5 Haiku $1,560

For input-heavy work, GPT-5 nano still has a very strong case. For generation-heavy or balanced workloads, Ministral 3 3B gets more interesting fast.

[stat] $180/month Rough cost to run a 50K-conversations/day chatbot on Ministral 3 3B


How to Save Even More

Already on the cheapest model? Good. You can still cut the bill.

1. Prompt caching

OpenAI, Anthropic, and Google all offer cached input discounts. If you repeat system prompts or shared context, caching is the easiest free win.

2. Batch APIs

If the workload is not real-time, batch it. OpenAI's batch pricing still crushes normal synchronous pricing for background jobs. We broke that down in our Batch API savings guide.

3. Shorter prompts

A bloated prompt is just you paying to hear yourself talk. Compress system instructions, strip repeated context, and stop sending novels when a paragraph will do.

4. Model routing

Use the cheapest model that can actually do the job. Route easy tasks to nano-tier models, step up to something like DeepSeek V4 Flash or Mistral Small only when the task genuinely needs it. That beats using one premium model for everything.

5. Output limits

Set max_tokens. Runaway generation is how cheap models stop being cheap.

For a broader playbook, see our AI API cost optimization strategies guide.


The Bottom Line

The cheapest end of the AI API market got more competitive, not less.

✅ TL;DR: GPT-5 nano still wins on input price. Ministral 3 3B is the current balanced cost floor. Gemini 2.0 Flash-Lite and Llama 4 Scout are the most interesting cheap long-context plays. DeepSeek V4 Flash is the best low-cost upgrade when nano-tier quality feels too thin.

Use our cost calculator to estimate your exact monthly spend, or open the full pricing table to sort every model in the database by input, output, or estimated request cost.


Frequently asked questions

What is the cheapest AI API available in 2026?

By input price, GPT-5 nano at $0.05 per million tokens is still the cheapest model in the database. If you care about combined low pricing instead of input alone, Ministral 3 3B at $0.10 input / $0.10 output is the most aggressive budget option right now.

Are the cheapest AI models good enough for production?

Yes — for the right tasks. Routing, classification, extraction, summarization, and lightweight chat can run very well on budget models. The trick is not pretending a bargain model should do flagship reasoning work. Use cheap models for bulk traffic and route hard cases upward.

How much does it cost to run an AI chatbot in 2026?

A high-volume chatbot can still be surprisingly cheap. In the example above, 50,000 conversations/day lands around $180/month on Ministral 3 3B, $270/month on Gemini 2.0 Flash-Lite, and $300/month on GPT-5 nano. Premium models can blow past those numbers very quickly.

Why is Anthropic so much more expensive than other providers?

Because Anthropic is not trying to win the pure price war. Their cheapest model, Claude 3.5 Haiku at $0.80/$4.00, sits far above the budget floor. You pay for the Claude quality profile, not for bargain-basement token prices. If cost is the main goal, read our OpenAI vs Anthropic comparison before defaulting to Claude.

What hidden costs should I watch for with cheap AI APIs?

Four big ones: 1) retries, 2) oversized prompts, 3) long outputs you did not cap, and 4) reasoning or tool overhead that turns a cheap model into an expensive workflow. Budget the whole system, not just the advertised per-token rate.


Prices verified against the latest pricing dataset updated May 2026. We refresh model pricing weekly from official provider sources. Compare all models →

Related Comparisons