Every AI Model Under $1 Per Million Tokens (March 2026)
Two years ago, the cheapest AI API cost $10 per million input tokens. Today, you can access models that rival GPT-4's original performance for under $0.10 per million tokens — a 100x price collapse that has fundamentally rewritten the economics of building with AI.
The sub-dollar segment isn't a graveyard of weak models. It now includes flagship-tier options like Mistral Large 3 at $0.50/$1.50, reasoning models like DeepSeek R1 V3.2 at $0.28/$0.42, and Google's Gemini Flash family that delivers million-token context windows for fractions of a cent per request. If you're still defaulting to GPT-5.4 or Claude Opus for every API call, you're likely overspending by 10-50x on tasks these cheaper models handle just as well.
This guide maps every model priced under $1 per million input tokens, compares them on capability and real-world cost per task, and tells you exactly which one to use for what.
[stat] 25+ AI models now priced under $1 per million input tokens — up from just 3 in early 2024
The complete sub-dollar pricing table
Here's every model priced under $1/M input tokens as of March 2026, sorted by input price:
| Model | Provider | Input / 1M | Output / 1M | Context | Category |
|---|---|---|---|---|---|
| GPT-5 nano | OpenAI | $0.05 | $0.40 | 128K | Efficient |
| Mistral Small 3.2 | Mistral | $0.06 | $0.18 | 128K | Efficient |
| Gemini 2.0 Flash-Lite | $0.075 | $0.30 | 1M | Efficient | |
| GPT-4.1 nano | OpenAI | $0.10 | $0.40 | 128K | Efficient |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 1M | Efficient | |
| Gemini 2.0 Flash | $0.10 | $0.40 | 1M | Efficient | |
| Command R | Cohere | $0.15 | $0.60 | 128K | Efficient |
| GPT-4o mini | OpenAI | $0.15 | $0.60 | 128K | Efficient |
| Llama 3.1 8B | Meta | $0.18 | $0.18 | 128K | Efficient |
| Grok 4.1 Fast | xAI | $0.20 | $0.50 | 2M | Efficient |
| GPT-5 mini | OpenAI | $0.25 | $2.00 | 500K | Efficient |
| Llama 4 Maverick | Meta | $0.27 | $0.85 | 1M | Flagship |
| DeepSeek V3.2 | DeepSeek | $0.28 | $0.42 | 128K | Efficient |
| DeepSeek R1 V3.2 | DeepSeek | $0.28 | $0.42 | 128K | Reasoning |
| Gemini 2.5 Flash | $0.30 | $2.50 | 1M | Efficient | |
| Codestral | Mistral | $0.30 | $0.90 | 128K | Coding |
| Grok 3 Mini | xAI | $0.30 | $0.50 | 128K | Efficient |
| GPT-4.1 mini | OpenAI | $0.40 | $1.60 | 200K | Efficient |
| Devstral 2 | Mistral | $0.40 | $2.00 | 256K | Coding |
| Mistral Medium 3 | Mistral | $0.40 | $2.00 | 128K | Balanced |
| Mistral Medium 3.1 | Mistral | $0.40 | $2.00 | 131K | Balanced |
| Gemini 3 Flash | $0.50 | $3.00 | 1M | Efficient | |
| Mistral Large 3 | Mistral | $0.50 | $1.50 | 256K | Flagship |
| Magistral Small | Mistral | $0.50 | $1.50 | 128K | Reasoning |
| Claude 3.5 Haiku | Anthropic | $0.80 | $4.00 | 200K | Efficient |
| Llama 3.3 70B | Meta | $0.88 | $0.88 | 131K | Standard |
| Llama 3.1 70B | Meta | $0.88 | $0.88 | 128K | Balanced |
That's 27 models from 8 providers, all under a dollar per million input tokens. The range spans from $0.05 (GPT-5 nano) to $0.88 (Llama 3.3 70B) — an 18x spread even within the budget tier.
💡 Key Takeaway: Input pricing tells only half the story. DeepSeek V3.2 at $0.28/$0.42 has a 1.5:1 input-to-output ratio, while GPT-5 mini at $0.25/$2.00 has an 8:1 ratio. For output-heavy tasks like content generation, the cheaper input price can be deceptive.
Real cost per task: what you actually pay
Raw per-million-token pricing is abstract. Here's what common tasks actually cost with each model, assuming typical token counts: a chatbot response (500 input / 300 output tokens), a document summary (2,000 input / 500 output), and a code generation task (1,000 input / 1,500 output).
Chatbot response (500 in / 300 out)
| Model | Cost per response | Cost per 10K responses |
|---|---|---|
| GPT-5 nano | $0.000145 | $1.45 |
| Mistral Small 3.2 | $0.000084 | $0.84 |
| Gemini 2.0 Flash | $0.000170 | $1.70 |
| DeepSeek V3.2 | $0.000266 | $2.66 |
| GPT-5 mini | $0.000725 | $7.25 |
| Grok 4.1 Fast | $0.000250 | $2.50 |
| Mistral Large 3 | $0.000700 | $7.00 |
| Claude 3.5 Haiku | $0.001600 | $16.00 |
📊 Quick Math: At 10,000 chatbot responses per day, Mistral Small 3.2 costs you $0.84/day ($25/month). Claude 3.5 Haiku costs $16/day ($480/month) for the same volume — a 19x difference.
Code generation (1,000 in / 1,500 out)
| Model | Cost per request | Cost per 10K requests |
|---|---|---|
| GPT-5 nano | $0.000650 | $6.50 |
| Mistral Small 3.2 | $0.000330 | $3.30 |
| Codestral | $0.001650 | $16.50 |
| DeepSeek V3.2 | $0.000910 | $9.10 |
| GPT-5 mini | $0.003250 | $32.50 |
| Grok 4.1 Fast | $0.000950 | $9.50 |
| Devstral 2 | $0.003400 | $34.00 |
| GPT-4.1 mini | $0.002800 | $28.00 |
For code generation, output costs dominate. Mistral Small 3.2 remains the cheapest at $0.00033 per request, but Codestral and Devstral 2 — Mistral's purpose-built coding models — cost 5-10x more. Whether their code quality justifies the premium depends on your task complexity.
Tier breakdown: five price bands
Not all sub-dollar models are equal. They cluster into five distinct performance tiers that map to different use cases.
Tier 1: Ultra-cheap ($0.05-$0.10 input)
Models: GPT-5 nano, Mistral Small 3.2, Gemini 2.0 Flash-Lite, GPT-4.1 nano, Gemini 2.5 Flash-Lite, Gemini 2.0 Flash
These are your high-volume workhorses. They handle classification, extraction, simple Q&A, and routing decisions where you need millions of calls per day without blowing your budget. GPT-5 nano and Mistral Small 3.2 are text-only, while the Gemini Flash variants add vision capability and 1M token context windows — a combination that's frankly absurd at $0.10/M input.
Best for: Intent classification, entity extraction, content moderation, simple summarization, routing layers in multi-model architectures.
Avoid for: Complex reasoning, nuanced creative writing, multi-step coding tasks.
Tier 2: Budget all-rounders ($0.15-$0.20 input)
Models: Command R, GPT-4o mini, Llama 3.1 8B, Grok 4.1 Fast
This tier punches above its weight. GPT-4o mini was the budget king of 2024 and still delivers solid performance. But the standout here is Grok 4.1 Fast at $0.20/$0.50 with a 2M token context window — the largest context available under $1/M. If you need to process entire codebases or long documents cheaply, nothing else comes close.
Best for: RAG applications, customer support bots, document processing, long-context analysis (Grok 4.1 Fast specifically).
Tier 3: Mid-range performers ($0.25-$0.30 input)
Models: GPT-5 mini, Llama 4 Maverick, DeepSeek V3.2, DeepSeek R1 V3.2, Gemini 2.5 Flash, Codestral, Grok 3 Mini
This is where it gets interesting. DeepSeek R1 V3.2 gives you a full reasoning model — the kind that shows its chain-of-thought and solves graduate-level math — for $0.28/$0.42. That's cheaper than GPT-4o mini's output pricing. Meanwhile, Llama 4 Maverick is a flagship-class model with 1M context at $0.27/$0.85, available through Together AI and other providers.
GPT-5 mini is OpenAI's entry here at $0.25 input, but its $2.00 output pricing makes it expensive for generation-heavy workloads. It's best suited for tasks where the input is large but the output is short — think classification on long documents.
Best for: Reasoning tasks (DeepSeek R1), coding (Codestral), general-purpose work where quality matters more than pure cost, batch processing.
⚠️ Warning: DeepSeek V3.2 and R1 V3.2 share identical pricing but differ fundamentally — R1 is a reasoning model that generates thinking tokens. Your actual costs with R1 may be 2-5x higher than V3.2 for the same prompt because of reasoning overhead. Monitor your output token usage carefully.
Tier 4: Premium budget ($0.40-$0.50 input)
Models: GPT-4.1 mini, Devstral 2, Mistral Medium 3/3.1, Gemini 3 Flash, Mistral Large 3, Magistral Small
The premium budget tier delivers near-flagship quality. Mistral Large 3 is the headline act — a full flagship model priced at just $0.50/$1.50, making it cheaper than most providers' "mini" models. It supports 256K context, tool use, and function calling with quality that competes with models 6-10x its price.
Gemini 3 Flash at $0.50/$3.00 brings Google's latest architecture to the budget tier with 1M context. Magistral Small adds reasoning capabilities at the same $0.50 input price.
Best for: Production applications requiring high quality, complex multi-turn conversations, agentic workflows, tasks where you'd normally use a $3-5/M model.
Tier 5: Sub-dollar ceiling ($0.80-$0.88 input)
Models: Claude 3.5 Haiku, Llama 3.3 70B, Llama 3.1 70B
These models sit just under the $1 threshold. Claude 3.5 Haiku at $0.80/$4.00 is Anthropic's cheapest current offering, and it remains one of the most reliable options for structured output and tool use. The Llama 70B variants offer a unique pricing structure where input and output cost the same ($0.88/M each) — beneficial for output-heavy workloads but less competitive for input-heavy ones.
Best for: When you need Anthropic/Meta ecosystem compatibility, structured outputs, or balanced input/output pricing.
Context window comparison: size vs. cost
One of the biggest differentiators in the budget tier is context window size. The range is staggering:
| Context Size | Models | Cheapest Option |
|---|---|---|
| 2M tokens | Grok 4.1 Fast | $0.20/$0.50 |
| 1M tokens | Gemini Flash family, Llama 4 Maverick, Gemini 3 Flash | $0.075/$0.30 (Gemini 2.0 Flash-Lite) |
| 500K tokens | GPT-5 mini | $0.25/$2.00 |
| 256K tokens | Mistral Large 3, Devstral 2 | $0.40/$2.00 (Devstral 2) |
| 128-200K tokens | Everything else | $0.05/$0.40 (GPT-5 nano) |
💡 Key Takeaway: If your use case requires processing long documents, the choice is clear. Grok 4.1 Fast gives you 2M tokens at $0.20 input — that's enough to process an entire novel for about $0.10. Google's Gemini Flash models offer 1M tokens at even lower prices if 2M isn't necessary.
The cost to fill a full context window varies dramatically:
- Grok 4.1 Fast (2M context): $0.40 to fill the input window
- Gemini 2.0 Flash-Lite (1M context): $0.075 to fill the input window
- GPT-5 mini (500K context): $0.125 to fill the input window
- Claude 3.5 Haiku (200K context): $0.16 to fill the input window
Gemini 2.0 Flash-Lite can process a million tokens for less than it costs to fill Claude 3.5 Haiku's 200K window. That's a 5x context advantage at half the price.
Reasoning on a budget
Reasoning models — the kind that show chain-of-thought and solve complex problems — used to be expensive. OpenAI's o1 launched at $15/$60 per million tokens. Today, you have three reasoning options under $1/M input:
| Model | Input / 1M | Output / 1M | Reasoning Quality |
|---|---|---|---|
| DeepSeek R1 V3.2 | $0.28 | $0.42 | Strong (math, code, logic) |
| Magistral Small | $0.50 | $1.50 | Good (general reasoning) |
| Grok 3 Mini | $0.30 | $0.50 | Moderate (fast reasoning) |
DeepSeek R1 V3.2 is the standout. At $0.28/$0.42, it costs roughly 1/200th of o1 Pro's pricing while delivering competitive results on math and coding benchmarks. For startups and developers who need reasoning capabilities without enterprise budgets, R1 V3.2 has been a game-changer.
Magistral Small from Mistral takes a different approach — it's a structured reasoning model optimized for step-by-step problem solving rather than open-ended chain-of-thought. At $0.50/$1.50, it's slightly pricier but more predictable in output length.
A word of caution on reasoning costs: Reasoning models generate thinking tokens that count toward your output bill. A simple question might use 200 output tokens on a standard model but 2,000+ tokens on a reasoning model (most of which are reasoning traces). Your effective cost per task can be 3-10x the naive per-token calculation. Always benchmark your specific prompts before committing to a reasoning model at scale.
The best sub-dollar model for every use case
Stop scrolling through pricing tables. Here's what to pick based on what you're building:
High-volume chatbot (>100K messages/day): Mistral Small 3.2 ($0.06/$0.18). Cheapest output pricing in the entire market for a capable model. At 100K daily messages, you're looking at roughly $8/day.
RAG/retrieval application: Gemini 2.0 Flash ($0.10/$0.40) for the 1M context window, or Grok 4.1 Fast ($0.20/$0.50) if you need the full 2M. Both handle long retrieved contexts efficiently.
Code generation: Codestral ($0.30/$0.90) if quality matters, Mistral Small 3.2 ($0.06/$0.18) if cost matters. DeepSeek V3.2 ($0.28/$0.42) is a strong middle ground — competitive code quality at bottom-tier pricing.
Math and reasoning: DeepSeek R1 V3.2 ($0.28/$0.42). Nothing else comes close on price-to-reasoning-quality ratio.
Document summarization: Gemini 2.0 Flash-Lite ($0.075/$0.30) for short-to-medium documents. For book-length content, Grok 4.1 Fast's 2M context avoids chunking entirely.
Content generation (articles, marketing copy): GPT-4.1 mini ($0.40/$1.60) or Mistral Large 3 ($0.50/$1.50). Both produce polished, publication-ready text that avoids the "AI slop" problem of ultra-cheap models.
Production API with reliability SLAs: GPT-5 mini ($0.25/$2.00) or Claude 3.5 Haiku ($0.80/$4.00). OpenAI and Anthropic offer the most robust API infrastructure with enterprise support, uptime guarantees, and compliance certifications.
✅ TL;DR: For most developers, the sweet spot is DeepSeek V3.2 for general tasks, Gemini Flash for long-context work, and Mistral Small 3.2 for high-volume simple tasks. Only reach for the $0.50+ tier when you need flagship quality or enterprise reliability.
Cost optimization strategies for sub-dollar models
Even at these prices, there are ways to cut costs further:
1. Use prompt caching
OpenAI, Anthropic, and Google all offer cached input pricing at 50-90% discounts. GPT-5.4's cached input rate is $0.25/M — cheaper than many budget models' standard rate. If you're sending the same system prompt or context to every request, caching alone can drop your costs below the cheapest models on this list.
2. Route by complexity
Don't send every request to the same model. Use a cheap classifier (GPT-5 nano at $0.05/M) to triage incoming requests, then route simple ones to ultra-cheap models and complex ones to mid-tier options. A well-designed routing layer can cut average costs by 40-60% while maintaining quality on hard tasks.
3. Batch API for non-real-time work
OpenAI's Batch API offers 50% off standard pricing. GPT-5 mini through Batch drops to effectively $0.125/$1.00 — making it competitive with the ultra-cheap tier while delivering GPT-5-class quality. Google offers similar batch discounts on Gemini models.
4. Monitor output token bloat
Reasoning models and verbose models can silently inflate your bills. Set max_tokens limits appropriate to your use case. A chatbot response rarely needs more than 500 tokens — don't pay for 4,000-token outputs that get truncated in the UI anyway.
5. Consider self-hosting for extreme volume
At very high volumes (millions of requests per day), self-hosting open models like Llama 3.3 70B or Llama 4 Maverick can undercut API pricing. The crossover point depends on your GPU costs, but for many teams, it's somewhere around 500K-1M requests per day. Below that, API pricing is almost always cheaper when you factor in engineering and infrastructure overhead.
The price floor: how low can it go?
We've seen a 100x price reduction in AI API pricing over two years. The cheapest model today (GPT-5 nano at $0.05/M input) processes roughly 750 pages of text for a nickel. Where does this end?
The answer is probably close to where we are now for cloud APIs. Providers have infrastructure costs — GPU compute, networking, cooling, staff — that create a hard floor. The ultra-cheap models are already running on optimized inference stacks with aggressive batching and quantization. Marginal costs for a single inference call approach fractions of a cent, but they never reach zero.
What will keep dropping is the quality available at each price point. Today's $0.10 model is better than 2024's $10 model. By late 2026, expect $0.10 models that match today's GPT-5.4 on most benchmarks. The race isn't to make AI cheaper — it's to make cheap AI smarter.
📊 Quick Math: Processing 1 million pages of text cost roughly $10,000 with GPT-4 in 2024. With Gemini 2.0 Flash-Lite today, the same job costs about $100. That's a 100x reduction in under two years.
Frequently asked questions
What is the cheapest AI model available via API right now?
GPT-5 nano from OpenAI at $0.05 per million input tokens and $0.40 per million output tokens. It's text-only with a 128K context window, suitable for classification, extraction, and simple completions. If you need vision or a larger context, Gemini 2.0 Flash-Lite at $0.075/$0.30 is the cheapest multimodal option with a 1M token window. Use our calculator to compare costs for your specific workload.
Are cheap AI models good enough for production?
Absolutely — for the right tasks. Models like Mistral Large 3 ($0.50/$1.50) and DeepSeek V3.2 ($0.28/$0.42) deliver quality that would have been flagship-tier just a year ago. The key is matching the model to the task. Don't use GPT-5 nano for complex reasoning, and don't waste Claude Opus on intent classification. A model routing strategy that assigns tasks to appropriate tiers can save 40-60% while maintaining quality.
How do I calculate my actual AI API costs?
Multiply your average input tokens per request by the input price, add your average output tokens by the output price, then multiply by your daily request volume. For example, 10,000 daily requests at 500 input / 300 output tokens on DeepSeek V3.2: (500 × $0.00000028) + (300 × $0.00000042) = $0.000266 per request, or $2.66/day. Try our AI cost calculator for instant estimates across all models.
Which budget AI model has the largest context window?
Grok 4.1 Fast from xAI offers 2 million tokens of context at just $0.20/$0.50 per million tokens — the largest context window available under $1/M input. Google's Gemini models follow with 1M token windows across most of their Flash lineup. For comparison, processing large contexts with premium models can cost 10-50x more.
Is DeepSeek R1 V3.2 really as good as expensive reasoning models?
On math and coding benchmarks, DeepSeek R1 V3.2 performs surprisingly close to models costing 50-200x more. It's particularly strong at AIME-level math, competitive programming, and logical deduction. Where it falls short is in nuanced instruction following, safety guardrails, and consistency — premium models like o3 and Claude Opus still outperform significantly on tasks requiring judgment and reliability. For pure reasoning tasks with clear right/wrong answers, R1 V3.2 is a steal at $0.28/$0.42. Check our reasoning model comparison for detailed benchmarks.
Bottom line
The sub-dollar AI market in March 2026 is absurdly competitive. You have 27 models from 8 providers, spanning everything from $0.05 text classifiers to $0.50 flagship-quality models with 256K context windows. The old excuse of "AI APIs are too expensive" no longer holds — if your use case is cost-sensitive, there's almost certainly a model under $1/M that handles it well.
The real skill now isn't finding a cheap model — it's building a system that routes each request to the right one. Use our calculator to model your specific workload across these options, and check out our guide on AI model routing to implement intelligent cost optimization.
Your AI budget just got a lot more interesting.
