If you're building with AI in 2026, cost matters. Whether you're prototyping a chatbot, scaling a production app, or just experimenting — the price difference between models is staggering. The cheapest option costs $0.05 per million input tokens. The most expensive? $168. That's a 3,360× difference.
We pulled pricing data from all 8 major providers and ranked every model. No affiliate bias, no sponsored picks — just the numbers from official pricing pages, verified February 2026.
[stat] 3,360× The price gap between GPT-5 Nano ($0.05/M input) and GPT-5.2 pro ($168/M output) — the cheapest and most expensive AI APIs available today
The 10 Cheapest AI APIs Right Now
Here are the most affordable models available via API, ranked by input token price:
| Rank | Model | Provider | Input $/1M | Output $/1M | Context |
|---|---|---|---|---|---|
| 1 | GPT-5 Nano | OpenAI | $0.05 | $0.40 | 128K |
| 2 | Mistral Small 3.2 | Mistral | $0.06 | $0.18 | 128K |
| 3 | Gemini 2.0 Flash-Lite | $0.07 | $0.30 | 1M | |
| 4 | GPT-4.1 nano | OpenAI | $0.10 | $0.40 | 128K |
| 5 | Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 1M | |
| 6 | Llama 3.1 8B | Meta | $0.18 | $0.18 | 128K |
| 7 | Gemini 2.5 Flash | $0.15 | $0.60 | 1M | |
| 8 | GPT-4o mini | OpenAI | $0.15 | $0.60 | 128K |
| 9 | Command R | Cohere | $0.15 | $0.60 | 128K |
| 10 | Grok 4.1 Fast | xAI | $0.20 | $0.50 | 2M |
The standout? Mistral Small 3.2 — at just $0.06 input and $0.18 output per million tokens, it has the lowest combined cost of any general-purpose model on the market. For pure output economy, Llama 3.1 8B at $0.18/$0.18 is unmatched — flat pricing regardless of direction.
💡 Key Takeaway: Don't just compare input prices. Mistral Small 3.2 has the lowest output cost at $0.18/M — that's 3.3× cheaper on output than GPT-4o mini ($0.60/M), which matters enormously for generation-heavy workloads.
Best Value by Category
Price isn't everything. Here's the best deal in each model tier, balancing cost against capability.
Best Budget Model: Mistral Small 3.2 ($0.06/$0.18)
Mistral's compact model offers the lowest combined cost of any general-purpose model. At $0.06 input / $0.18 output per million tokens, it's absurdly cheap for classification, extraction, and simple generation. The 128K context window handles most production workloads. The trade-off: it's less capable than GPT-4o mini on complex reasoning, but for structured tasks, the quality gap is minimal.
Best All-Rounder Under $1: DeepSeek V3.2 ($0.28/$0.42)
DeepSeek V3.2 continues to undercut everyone in its quality class. At $0.28 input / $0.42 output, you get a model with reasoning and coding capabilities that rivals models costing 5–10× more. The 128K context window handles most production workloads. Availability and latency can vary by provider, but for batch processing or latency-tolerant applications, it's hard to beat.
Best Flagship Under $2: GPT-5.2 ($1.75/$14.00)
OpenAI's latest flagship is surprisingly competitive at the top end. With a 1M token context window, vision, audio, and code capabilities — GPT-5.2 is the most capable model you can get for under $2/M input tokens. Compare it to Claude Opus 4.6 at $5/$25 — GPT-5.2 is 2.9× cheaper on input and 1.8× cheaper on output.
Best Reasoning Model: o4-mini ($1.10/$4.40)
If you need chain-of-thought reasoning without the o3-pro price tag ($20/$80), o4-mini delivers. It shares the same price as o3-mini but with a massive 2M token context window — the largest of any reasoning model. Be aware that reasoning models generate hidden thinking tokens that increase your actual cost beyond the sticker price.
Best Context Window: Gemini 2.5 Pro ($1.25/$10.00 — 2M tokens)
Google wins the context war. At 2 million tokens, Gemini 2.5 Pro can process entire codebases, books, or document collections in a single call. And at $1.25/M input, it's cheaper than most flagships. If your workload involves processing very long documents, this eliminates the need for chunking and RAG pipelines entirely.
The Real Cost: Input vs Output
Don't just look at input prices. Most AI workloads generate significantly more output tokens relative to their cost — because output tokens are 3–8× more expensive. A model that's cheap on input but expensive on output can cost more overall.
Example: Processing 1M input tokens and generating 200K output tokens:
| Model | Input Cost | Output Cost | Total |
|---|---|---|---|
| Mistral Small 3.2 | $0.06 | $0.036 | $0.096 |
| GPT-5 Nano | $0.05 | $0.08 | $0.13 |
| Gemini 2.5 Flash-Lite | $0.10 | $0.08 | $0.18 |
| DeepSeek V3.2 | $0.28 | $0.084 | $0.36 |
| GPT-5.2 | $1.75 | $2.80 | $4.55 |
| Claude Opus 4.6 | $5.00 | $5.00 | $10.00 |
📊 Quick Math: The gap between budget and premium is 100× for the same workload. That's the difference between $9/month and $900/month at moderate usage (processing 1M input + 200K output tokens daily).
Cheapest by Provider
Every provider has a budget option. Here's each one's most affordable model:
| Provider | Cheapest Model | Input $/1M | Output $/1M |
|---|---|---|---|
| OpenAI | GPT-5 Nano | $0.05 | $0.40 |
| Mistral | Mistral Small 3.2 | $0.06 | $0.18 |
| Gemini 2.0 Flash-Lite | $0.07 | $0.30 | |
| Meta | Llama 3.1 8B | $0.18 | $0.18 |
| xAI | Grok 4.1 Fast | $0.20 | $0.50 |
| DeepSeek | V3.2 | $0.28 | $0.42 |
| Cohere | Command R | $0.15 | $0.60 |
| Anthropic | Claude 3.5 Haiku | $0.80 | $4.00 |
Anthropic is the priciest at the budget end — their cheapest model (Claude 3.5 Haiku at $0.80/M) costs 13× more on input than the cheapest overall (GPT-5 Nano). You're paying for the Claude quality floor. Whether that quality premium is worth it depends on your task — read our OpenAI vs Anthropic comparison for a detailed analysis.
⚠️ Warning: Cheap per-token pricing doesn't account for hidden costs like retries, context window waste, and rate limit overhead. A model that's 10× cheaper but fails 20% of the time may cost more in practice. Read our hidden costs guide before committing to the cheapest option.
Real-World Cost at Scale
Abstract pricing means nothing without context. Here's what common workloads actually cost on the cheapest models:
High-Volume Chatbot (50K conversations/day)
800 input tokens, 400 output tokens per conversation. Monthly: 1.2B input, 600M output tokens.
| Model | Monthly Cost |
|---|---|
| Mistral Small 3.2 | $72 + $108 = $180 |
| GPT-5 Nano | $60 + $240 = $300 |
| DeepSeek V3.2 | $336 + $252 = $588 |
| Claude Haiku 4.5 | $1,200 + $3,000 = $4,200 |
That's a 23× cost difference between Mistral Small and Claude Haiku for the same chatbot. At $180/month, Mistral Small 3.2 makes AI chatbots viable even for bootstrapped startups.
Document Processing Pipeline (10K documents/day)
4,000 input tokens, 500 output tokens per document. Monthly: 1.2B input, 150M output tokens.
| Model | Monthly Cost |
|---|---|
| GPT-5 Nano | $60 + $60 = $120 |
| Mistral Small 3.2 | $72 + $27 = $99 |
| Gemini 2.5 Flash-Lite | $120 + $60 = $180 |
For input-heavy workloads, GPT-5 Nano's $0.05/M input is the cost floor. But Mistral Small 3.2 wins on total cost because its output pricing ($0.18/M) is so low.
[stat] $99/month The total cost to process 10,000 documents per day using Mistral Small 3.2 — less than most SaaS subscriptions
How to Save Even More
Already on the cheapest model? Here are five more ways to cut costs:
1. Prompt caching
OpenAI, Anthropic, and Google all offer cached input pricing at 50–90% discount. If you're sending similar prompts repeatedly (same system prompt, shared context), this is free money. OpenAI applies it automatically; Anthropic requires explicit cache headers.
2. Batch API
OpenAI's batch endpoint gives 50% off all token costs. If your workload isn't real-time, batch everything. We wrote a complete guide on saving with the Batch API.
3. Shorter prompts
Every token costs money. Strip system prompts to essentials, use concise instructions, avoid redundant context. A 2,000-token system prompt that could be 500 tokens is wasting 75% on input costs every single request.
4. Model routing
Use a cheap model (GPT-5 Nano) for simple tasks and route complex ones to a flagship. Don't use a $14/M output model to format JSON. A basic router cuts costs 40–60% compared to a single model for everything, and this is even clearer in newer comparisons like GPT-5.4 mini vs nano benchmarks.
5. Output limits
Set max_tokens to prevent runaway generation. A model generating 4K tokens when you needed 200 is pure waste. This single parameter can save 50%+ on output costs for applications with predictable response lengths.
For a complete optimization playbook, see our cost optimization strategies guide.
The Bottom Line
The AI API market in 2026 is a buyer's paradise. Models that would have cost $60/M tokens two years ago now have equivalents at $0.10. The key is matching your workload to the right price tier:
- High volume, simple tasks → Mistral Small 3.2, GPT-5 Nano, Gemini Flash-Lite ($0.05–0.10/M input)
- General purpose, good quality → DeepSeek V3.2, Gemini 2.5 Flash ($0.15–0.28/M input), or the DeepSeek vs Mistral budget showdown
- Production flagship → GPT-5.2, Claude Sonnet 4.6, Gemini 3 Pro ($1.75–3.00/M input)
- Maximum capability → Claude Opus 4.6, GPT-5.2 pro, o3-pro ($5–21/M input)
✅ TL;DR: Mistral Small 3.2 ($0.06/$0.18) is the cheapest general-purpose model. GPT-5 Nano ($0.05/$0.40) has the cheapest input. Llama 3.1 8B ($0.18/$0.18) has flat pricing. The budget-to-premium gap is 100×. Use model routing and prompt caching to squeeze even more savings.
Use our cost calculator to estimate your exact monthly spend, or check model comparisons to see how any two models stack up side by side. For a detailed look at the best budget options, read our budget model roundup.
Frequently asked questions
What is the cheapest AI API available in 2026?
By input price, GPT-5 Nano at $0.05 per million tokens. By output price, Mistral Small 3.2 at $0.18 per million tokens. By combined cost for a balanced workload, Mistral Small 3.2 ($0.06/$0.18) is the cheapest general-purpose option. For flat input/output pricing, Llama 3.1 8B at $0.18/$0.18 through Together AI is the simplest to budget for.
Are the cheapest AI models good enough for production?
Yes, for structured and well-defined tasks. GPT-5 Nano and Mistral Small 3.2 handle classification, extraction, summarization, and simple Q&A reliably. They struggle with complex reasoning, creative writing, and nuanced analysis. The strategy is model routing: use cheap models for 60–70% of requests and route the rest to mid-tier models.
How much does it cost to run an AI chatbot in 2026?
A chatbot handling 1,000 conversations/day costs roughly $6–180/month depending on the model. On GPT-5 Nano: ~$6/month. On DeepSeek V3.2: ~$20/month. On Claude Sonnet 4.6: ~$180/month. The biggest variable is model choice, not volume. Use our chatbot cost breakdown for detailed calculations.
Why is Anthropic so much more expensive than other providers?
Anthropic's cheapest model (Claude 3.5 Haiku at $0.80/$4.00) is 13× more expensive on input than GPT-5 Nano. This reflects Anthropic's focus on quality and safety rather than price competition. Claude models consistently score higher on nuanced tasks, instruction-following, and writing quality. You pay more per token but may need fewer retries and less prompt engineering. See our OpenAI vs Anthropic comparison.
What hidden costs should I watch for with cheap AI APIs?
The four biggest hidden costs: 1) Failed requests that still bill tokens. 2) Retries that multiply your spend (5% error rate with 2 retries = $6K/month waste at scale). 3) Context window waste from oversized prompts. 4) Reasoning model thinking tokens that don't appear in output but get billed. Budget an extra 30–50% beyond your per-token estimate. Read our hidden costs guide for the full breakdown.
Prices verified February 2026. We update pricing weekly via automated scraping of official provider pages. Compare all models →
