Skip to main content
March 11, 2026

Every AI Model Under $1 Per Million Tokens (March 2026)

There are now 25+ AI models priced under $1 per million input tokens. We compare every sub-dollar API from OpenAI, Google, Anthropic, Mistral, DeepSeek, Meta, and xAI — with real cost-per-task math and recommendations for every use case.

pricingcomparisonbudget2026guide
Every AI Model Under $1 Per Million Tokens (March 2026)

Every AI Model Under $1 Per Million Tokens (March 2026)

Two years ago, the cheapest AI API cost $10 per million input tokens. Today, you can access models that rival GPT-4's original performance for under $0.10 per million tokens — a 100x price collapse that has fundamentally rewritten the economics of building with AI.

The sub-dollar segment isn't a graveyard of weak models. It now includes flagship-tier options like Mistral Large 3 at $0.50/$1.50, reasoning models like DeepSeek R1 V3.2 at $0.28/$0.42, and Google's Gemini Flash family that delivers million-token context windows for fractions of a cent per request. If you're still defaulting to GPT-5.4 or Claude Opus for every API call, you're likely overspending by 10-50x on tasks these cheaper models handle just as well.

This guide maps every model priced under $1 per million input tokens, compares them on capability and real-world cost per task, and tells you exactly which one to use for what.

[stat] 25+ AI models now priced under $1 per million input tokens — up from just 3 in early 2024


The complete sub-dollar pricing table

Here's every model priced under $1/M input tokens as of March 2026, sorted by input price:

Model Provider Input / 1M Output / 1M Context Category
GPT-5 nano OpenAI $0.05 $0.40 128K Efficient
Mistral Small 3.2 Mistral $0.06 $0.18 128K Efficient
Gemini 2.0 Flash-Lite Google $0.075 $0.30 1M Efficient
GPT-4.1 nano OpenAI $0.10 $0.40 128K Efficient
Gemini 2.5 Flash-Lite Google $0.10 $0.40 1M Efficient
Gemini 2.0 Flash Google $0.10 $0.40 1M Efficient
Command R Cohere $0.15 $0.60 128K Efficient
GPT-4o mini OpenAI $0.15 $0.60 128K Efficient
Llama 3.1 8B Meta $0.18 $0.18 128K Efficient
Grok 4.1 Fast xAI $0.20 $0.50 2M Efficient
GPT-5 mini OpenAI $0.25 $2.00 500K Efficient
Llama 4 Maverick Meta $0.27 $0.85 1M Flagship
DeepSeek V3.2 DeepSeek $0.28 $0.42 128K Efficient
DeepSeek R1 V3.2 DeepSeek $0.28 $0.42 128K Reasoning
Gemini 2.5 Flash Google $0.30 $2.50 1M Efficient
Codestral Mistral $0.30 $0.90 128K Coding
Grok 3 Mini xAI $0.30 $0.50 128K Efficient
GPT-4.1 mini OpenAI $0.40 $1.60 200K Efficient
Devstral 2 Mistral $0.40 $2.00 256K Coding
Mistral Medium 3 Mistral $0.40 $2.00 128K Balanced
Mistral Medium 3.1 Mistral $0.40 $2.00 131K Balanced
Gemini 3 Flash Google $0.50 $3.00 1M Efficient
Mistral Large 3 Mistral $0.50 $1.50 256K Flagship
Magistral Small Mistral $0.50 $1.50 128K Reasoning
Claude 3.5 Haiku Anthropic $0.80 $4.00 200K Efficient
Llama 3.3 70B Meta $0.88 $0.88 131K Standard
Llama 3.1 70B Meta $0.88 $0.88 128K Balanced

That's 27 models from 8 providers, all under a dollar per million input tokens. The range spans from $0.05 (GPT-5 nano) to $0.88 (Llama 3.3 70B) — an 18x spread even within the budget tier.

💡 Key Takeaway: Input pricing tells only half the story. DeepSeek V3.2 at $0.28/$0.42 has a 1.5:1 input-to-output ratio, while GPT-5 mini at $0.25/$2.00 has an 8:1 ratio. For output-heavy tasks like content generation, the cheaper input price can be deceptive.


Real cost per task: what you actually pay

Raw per-million-token pricing is abstract. Here's what common tasks actually cost with each model, assuming typical token counts: a chatbot response (500 input / 300 output tokens), a document summary (2,000 input / 500 output), and a code generation task (1,000 input / 1,500 output).

Chatbot response (500 in / 300 out)

Model Cost per response Cost per 10K responses
GPT-5 nano $0.000145 $1.45
Mistral Small 3.2 $0.000084 $0.84
Gemini 2.0 Flash $0.000170 $1.70
DeepSeek V3.2 $0.000266 $2.66
GPT-5 mini $0.000725 $7.25
Grok 4.1 Fast $0.000250 $2.50
Mistral Large 3 $0.000700 $7.00
Claude 3.5 Haiku $0.001600 $16.00

📊 Quick Math: At 10,000 chatbot responses per day, Mistral Small 3.2 costs you $0.84/day ($25/month). Claude 3.5 Haiku costs $16/day ($480/month) for the same volume — a 19x difference.

Code generation (1,000 in / 1,500 out)

Model Cost per request Cost per 10K requests
GPT-5 nano $0.000650 $6.50
Mistral Small 3.2 $0.000330 $3.30
Codestral $0.001650 $16.50
DeepSeek V3.2 $0.000910 $9.10
GPT-5 mini $0.003250 $32.50
Grok 4.1 Fast $0.000950 $9.50
Devstral 2 $0.003400 $34.00
GPT-4.1 mini $0.002800 $28.00

For code generation, output costs dominate. Mistral Small 3.2 remains the cheapest at $0.00033 per request, but Codestral and Devstral 2 — Mistral's purpose-built coding models — cost 5-10x more. Whether their code quality justifies the premium depends on your task complexity.

$0.33
Mistral Small 3.2 per 1K code tasks
vs
$32.50
GPT-5 mini per 1K code tasks

Tier breakdown: five price bands

Not all sub-dollar models are equal. They cluster into five distinct performance tiers that map to different use cases.

Tier 1: Ultra-cheap ($0.05-$0.10 input)

Models: GPT-5 nano, Mistral Small 3.2, Gemini 2.0 Flash-Lite, GPT-4.1 nano, Gemini 2.5 Flash-Lite, Gemini 2.0 Flash

These are your high-volume workhorses. They handle classification, extraction, simple Q&A, and routing decisions where you need millions of calls per day without blowing your budget. GPT-5 nano and Mistral Small 3.2 are text-only, while the Gemini Flash variants add vision capability and 1M token context windows — a combination that's frankly absurd at $0.10/M input.

Best for: Intent classification, entity extraction, content moderation, simple summarization, routing layers in multi-model architectures.

Avoid for: Complex reasoning, nuanced creative writing, multi-step coding tasks.

Tier 2: Budget all-rounders ($0.15-$0.20 input)

Models: Command R, GPT-4o mini, Llama 3.1 8B, Grok 4.1 Fast

This tier punches above its weight. GPT-4o mini was the budget king of 2024 and still delivers solid performance. But the standout here is Grok 4.1 Fast at $0.20/$0.50 with a 2M token context window — the largest context available under $1/M. If you need to process entire codebases or long documents cheaply, nothing else comes close.

Best for: RAG applications, customer support bots, document processing, long-context analysis (Grok 4.1 Fast specifically).

Tier 3: Mid-range performers ($0.25-$0.30 input)

Models: GPT-5 mini, Llama 4 Maverick, DeepSeek V3.2, DeepSeek R1 V3.2, Gemini 2.5 Flash, Codestral, Grok 3 Mini

This is where it gets interesting. DeepSeek R1 V3.2 gives you a full reasoning model — the kind that shows its chain-of-thought and solves graduate-level math — for $0.28/$0.42. That's cheaper than GPT-4o mini's output pricing. Meanwhile, Llama 4 Maverick is a flagship-class model with 1M context at $0.27/$0.85, available through Together AI and other providers.

GPT-5 mini is OpenAI's entry here at $0.25 input, but its $2.00 output pricing makes it expensive for generation-heavy workloads. It's best suited for tasks where the input is large but the output is short — think classification on long documents.

Best for: Reasoning tasks (DeepSeek R1), coding (Codestral), general-purpose work where quality matters more than pure cost, batch processing.

⚠️ Warning: DeepSeek V3.2 and R1 V3.2 share identical pricing but differ fundamentally — R1 is a reasoning model that generates thinking tokens. Your actual costs with R1 may be 2-5x higher than V3.2 for the same prompt because of reasoning overhead. Monitor your output token usage carefully.

Tier 4: Premium budget ($0.40-$0.50 input)

Models: GPT-4.1 mini, Devstral 2, Mistral Medium 3/3.1, Gemini 3 Flash, Mistral Large 3, Magistral Small

The premium budget tier delivers near-flagship quality. Mistral Large 3 is the headline act — a full flagship model priced at just $0.50/$1.50, making it cheaper than most providers' "mini" models. It supports 256K context, tool use, and function calling with quality that competes with models 6-10x its price.

Gemini 3 Flash at $0.50/$3.00 brings Google's latest architecture to the budget tier with 1M context. Magistral Small adds reasoning capabilities at the same $0.50 input price.

Best for: Production applications requiring high quality, complex multi-turn conversations, agentic workflows, tasks where you'd normally use a $3-5/M model.

Tier 5: Sub-dollar ceiling ($0.80-$0.88 input)

Models: Claude 3.5 Haiku, Llama 3.3 70B, Llama 3.1 70B

These models sit just under the $1 threshold. Claude 3.5 Haiku at $0.80/$4.00 is Anthropic's cheapest current offering, and it remains one of the most reliable options for structured output and tool use. The Llama 70B variants offer a unique pricing structure where input and output cost the same ($0.88/M each) — beneficial for output-heavy workloads but less competitive for input-heavy ones.

Best for: When you need Anthropic/Meta ecosystem compatibility, structured outputs, or balanced input/output pricing.


Context window comparison: size vs. cost

One of the biggest differentiators in the budget tier is context window size. The range is staggering:

Context Size Models Cheapest Option
2M tokens Grok 4.1 Fast $0.20/$0.50
1M tokens Gemini Flash family, Llama 4 Maverick, Gemini 3 Flash $0.075/$0.30 (Gemini 2.0 Flash-Lite)
500K tokens GPT-5 mini $0.25/$2.00
256K tokens Mistral Large 3, Devstral 2 $0.40/$2.00 (Devstral 2)
128-200K tokens Everything else $0.05/$0.40 (GPT-5 nano)

💡 Key Takeaway: If your use case requires processing long documents, the choice is clear. Grok 4.1 Fast gives you 2M tokens at $0.20 input — that's enough to process an entire novel for about $0.10. Google's Gemini Flash models offer 1M tokens at even lower prices if 2M isn't necessary.

The cost to fill a full context window varies dramatically:

  • Grok 4.1 Fast (2M context): $0.40 to fill the input window
  • Gemini 2.0 Flash-Lite (1M context): $0.075 to fill the input window
  • GPT-5 mini (500K context): $0.125 to fill the input window
  • Claude 3.5 Haiku (200K context): $0.16 to fill the input window

Gemini 2.0 Flash-Lite can process a million tokens for less than it costs to fill Claude 3.5 Haiku's 200K window. That's a 5x context advantage at half the price.


Reasoning on a budget

Reasoning models — the kind that show chain-of-thought and solve complex problems — used to be expensive. OpenAI's o1 launched at $15/$60 per million tokens. Today, you have three reasoning options under $1/M input:

Model Input / 1M Output / 1M Reasoning Quality
DeepSeek R1 V3.2 $0.28 $0.42 Strong (math, code, logic)
Magistral Small $0.50 $1.50 Good (general reasoning)
Grok 3 Mini $0.30 $0.50 Moderate (fast reasoning)

DeepSeek R1 V3.2 is the standout. At $0.28/$0.42, it costs roughly 1/200th of o1 Pro's pricing while delivering competitive results on math and coding benchmarks. For startups and developers who need reasoning capabilities without enterprise budgets, R1 V3.2 has been a game-changer.

$0.28
DeepSeek R1 V3.2 input/M
vs
$15.00
OpenAI o1 input/M

Magistral Small from Mistral takes a different approach — it's a structured reasoning model optimized for step-by-step problem solving rather than open-ended chain-of-thought. At $0.50/$1.50, it's slightly pricier but more predictable in output length.

A word of caution on reasoning costs: Reasoning models generate thinking tokens that count toward your output bill. A simple question might use 200 output tokens on a standard model but 2,000+ tokens on a reasoning model (most of which are reasoning traces). Your effective cost per task can be 3-10x the naive per-token calculation. Always benchmark your specific prompts before committing to a reasoning model at scale.


The best sub-dollar model for every use case

Stop scrolling through pricing tables. Here's what to pick based on what you're building:

High-volume chatbot (>100K messages/day): Mistral Small 3.2 ($0.06/$0.18). Cheapest output pricing in the entire market for a capable model. At 100K daily messages, you're looking at roughly $8/day.

RAG/retrieval application: Gemini 2.0 Flash ($0.10/$0.40) for the 1M context window, or Grok 4.1 Fast ($0.20/$0.50) if you need the full 2M. Both handle long retrieved contexts efficiently.

Code generation: Codestral ($0.30/$0.90) if quality matters, Mistral Small 3.2 ($0.06/$0.18) if cost matters. DeepSeek V3.2 ($0.28/$0.42) is a strong middle ground — competitive code quality at bottom-tier pricing.

Math and reasoning: DeepSeek R1 V3.2 ($0.28/$0.42). Nothing else comes close on price-to-reasoning-quality ratio.

Document summarization: Gemini 2.0 Flash-Lite ($0.075/$0.30) for short-to-medium documents. For book-length content, Grok 4.1 Fast's 2M context avoids chunking entirely.

Content generation (articles, marketing copy): GPT-4.1 mini ($0.40/$1.60) or Mistral Large 3 ($0.50/$1.50). Both produce polished, publication-ready text that avoids the "AI slop" problem of ultra-cheap models.

Production API with reliability SLAs: GPT-5 mini ($0.25/$2.00) or Claude 3.5 Haiku ($0.80/$4.00). OpenAI and Anthropic offer the most robust API infrastructure with enterprise support, uptime guarantees, and compliance certifications.

✅ TL;DR: For most developers, the sweet spot is DeepSeek V3.2 for general tasks, Gemini Flash for long-context work, and Mistral Small 3.2 for high-volume simple tasks. Only reach for the $0.50+ tier when you need flagship quality or enterprise reliability.


Cost optimization strategies for sub-dollar models

Even at these prices, there are ways to cut costs further:

1. Use prompt caching

OpenAI, Anthropic, and Google all offer cached input pricing at 50-90% discounts. GPT-5.4's cached input rate is $0.25/M — cheaper than many budget models' standard rate. If you're sending the same system prompt or context to every request, caching alone can drop your costs below the cheapest models on this list.

2. Route by complexity

Don't send every request to the same model. Use a cheap classifier (GPT-5 nano at $0.05/M) to triage incoming requests, then route simple ones to ultra-cheap models and complex ones to mid-tier options. A well-designed routing layer can cut average costs by 40-60% while maintaining quality on hard tasks.

3. Batch API for non-real-time work

OpenAI's Batch API offers 50% off standard pricing. GPT-5 mini through Batch drops to effectively $0.125/$1.00 — making it competitive with the ultra-cheap tier while delivering GPT-5-class quality. Google offers similar batch discounts on Gemini models.

4. Monitor output token bloat

Reasoning models and verbose models can silently inflate your bills. Set max_tokens limits appropriate to your use case. A chatbot response rarely needs more than 500 tokens — don't pay for 4,000-token outputs that get truncated in the UI anyway.

5. Consider self-hosting for extreme volume

At very high volumes (millions of requests per day), self-hosting open models like Llama 3.3 70B or Llama 4 Maverick can undercut API pricing. The crossover point depends on your GPU costs, but for many teams, it's somewhere around 500K-1M requests per day. Below that, API pricing is almost always cheaper when you factor in engineering and infrastructure overhead.


The price floor: how low can it go?

We've seen a 100x price reduction in AI API pricing over two years. The cheapest model today (GPT-5 nano at $0.05/M input) processes roughly 750 pages of text for a nickel. Where does this end?

The answer is probably close to where we are now for cloud APIs. Providers have infrastructure costs — GPU compute, networking, cooling, staff — that create a hard floor. The ultra-cheap models are already running on optimized inference stacks with aggressive batching and quantization. Marginal costs for a single inference call approach fractions of a cent, but they never reach zero.

What will keep dropping is the quality available at each price point. Today's $0.10 model is better than 2024's $10 model. By late 2026, expect $0.10 models that match today's GPT-5.4 on most benchmarks. The race isn't to make AI cheaper — it's to make cheap AI smarter.

📊 Quick Math: Processing 1 million pages of text cost roughly $10,000 with GPT-4 in 2024. With Gemini 2.0 Flash-Lite today, the same job costs about $100. That's a 100x reduction in under two years.


Frequently asked questions

What is the cheapest AI model available via API right now?

GPT-5 nano from OpenAI at $0.05 per million input tokens and $0.40 per million output tokens. It's text-only with a 128K context window, suitable for classification, extraction, and simple completions. If you need vision or a larger context, Gemini 2.0 Flash-Lite at $0.075/$0.30 is the cheapest multimodal option with a 1M token window. Use our calculator to compare costs for your specific workload.

Are cheap AI models good enough for production?

Absolutely — for the right tasks. Models like Mistral Large 3 ($0.50/$1.50) and DeepSeek V3.2 ($0.28/$0.42) deliver quality that would have been flagship-tier just a year ago. The key is matching the model to the task. Don't use GPT-5 nano for complex reasoning, and don't waste Claude Opus on intent classification. A model routing strategy that assigns tasks to appropriate tiers can save 40-60% while maintaining quality.

How do I calculate my actual AI API costs?

Multiply your average input tokens per request by the input price, add your average output tokens by the output price, then multiply by your daily request volume. For example, 10,000 daily requests at 500 input / 300 output tokens on DeepSeek V3.2: (500 × $0.00000028) + (300 × $0.00000042) = $0.000266 per request, or $2.66/day. Try our AI cost calculator for instant estimates across all models.

Which budget AI model has the largest context window?

Grok 4.1 Fast from xAI offers 2 million tokens of context at just $0.20/$0.50 per million tokens — the largest context window available under $1/M input. Google's Gemini models follow with 1M token windows across most of their Flash lineup. For comparison, processing large contexts with premium models can cost 10-50x more.

Is DeepSeek R1 V3.2 really as good as expensive reasoning models?

On math and coding benchmarks, DeepSeek R1 V3.2 performs surprisingly close to models costing 50-200x more. It's particularly strong at AIME-level math, competitive programming, and logical deduction. Where it falls short is in nuanced instruction following, safety guardrails, and consistency — premium models like o3 and Claude Opus still outperform significantly on tasks requiring judgment and reliability. For pure reasoning tasks with clear right/wrong answers, R1 V3.2 is a steal at $0.28/$0.42. Check our reasoning model comparison for detailed benchmarks.


Bottom line

The sub-dollar AI market in March 2026 is absurdly competitive. You have 27 models from 8 providers, spanning everything from $0.05 text classifiers to $0.50 flagship-quality models with 256K context windows. The old excuse of "AI APIs are too expensive" no longer holds — if your use case is cost-sensitive, there's almost certainly a model under $1/M that handles it well.

The real skill now isn't finding a cheap model — it's building a system that routes each request to the right one. Use our calculator to model your specific workload across these options, and check out our guide on AI model routing to implement intelligent cost optimization.

Your AI budget just got a lot more interesting.