If you searched for Google Gemini API pricing in 2026, here's the official answer first.
Updated April 2026: the biggest point of confusion now is not just Gemini pricing per token, but how Google's free tier, AI Studio, usage-tier rate limits, and usage-based billing fit together.
Official Google Gemini API pricing per token: quick answer (2026)
- Pricing range: Official Google Gemini API pricing runs from $0.075/$0.30 (2.0 Flash-Lite input/output per 1M tokens) up to $2.00/$12.00 on Gemini 3 Pro, with a higher $4.00/$18.00 tier when 3 Pro prompts exceed 200K tokens.
- Free tier: Google's free tier is tied to an active project or free trial and includes Google AI Studio access, free tokens, and limited access to certain Gemini models.
- Monthly fees: There is no flat Gemini API subscription fee. Free usage is free, and paid usage is still billed per token once billing is enabled.
- Paid tiers: Moving to paid unlocks higher rate limits, context caching, Batch API pricing with a 50% cost reduction, and access to Google's most advanced models.
- Rate limits: Google tracks RPM, TPM, and RPD, applies those limits per project rather than per API key, and now tells developers to check AI Studio for their live limits because they vary by model and usage tier.
This guide breaks down each Gemini tier, compares Flash vs Pro costs, and shows workload math so you can pick the cheapest model that still meets your quality target. If you also want cross-provider benchmarks, use our AI API pricing guide.
Are there Gemini API subscription fees or monthly minimums?
No. Google does not charge a mandatory monthly Gemini API subscription fee just to keep the API enabled.
If you add billing, you move from the free tier into paid usage tiers, but billing stays usage-based. In plain English:
- Free tier: no monthly fee, limited access, lower rate limits
- Paid tier: no seat fee or flat subscription, but you pay per token and get higher limits
- Enterprise / Vertex AI: may involve committed spend or enterprise contracts, but that is a different buying motion from the standard Gemini Developer API
That matters because a lot of searches use phrases like "Gemini API subscription cost" or "Gemini API monthly fees". The clean answer is: standard Gemini API pricing is pay-as-you-go, not a monthly subscription.
Gemini free tier vs paid Gemini API: what changed in 2026
Most searchers asking about "Gemini API free tier" are really mixing together three separate things: free access in AI Studio, paid Gemini API access, and the usage tiers that control rate limits.
| Option | What you get | Best for | Important catch |
|---|---|---|---|
| Free | Limited access to certain models, free input/output tokens, Google AI Studio access | Testing, prototypes, tiny internal tools | Lower rate limits, and Google says free-tier content may be used to improve its products |
| Paid | Higher rate limits, context caching, Batch API with 50% cost reduction, access to Google's most advanced models | Production apps | Requires billing |
| Enterprise (Vertex AI) | Dedicated support, security/compliance options, provisioned throughput, volume discounts | Large deployments | Sales process, more setup |
That distinction matters because a lot of ranking pages answer the wrong question. They quote a single static free-tier limit and call it a day. That ages badly.
Gemini API free tier limits and rate limits: the non-confusing version
Here is what Google documents right now:
- Rate limits are per project, not per API key.
- Google measures requests per minute (RPM), tokens per minute (TPM), and requests per day (RPD).
- Exact active limits now depend on model + usage tier, and Google points developers to AI Studio to see the live numbers for their own project.
- Priority inference defaults to 0.3x the standard rate limit for each model and tier.
- Batch API has separate limits, including 100 concurrent batch requests, a 2GB input file limit, and 20GB file storage.
Gemini usage tiers that affect rate limits
| Usage tier | Qualification | Monthly spend ceiling |
|---|---|---|
| Free | Active project or free trial | N/A |
| Tier 1 | Set up and link an active billing account | $250 |
| Tier 2 | Paid $100 + 3 days from first successful payment | $2,000 |
| Tier 3 | Paid $1,000 + 30 days from first successful payment | $20,000 to $100,000+ |
So if you want the exact answer to "What are Gemini API free-tier rate limits?", the honest answer is: check AI Studio for your live project limits, then use the usage tier table above to understand how Google decides them. That's less clickbaity, but it's the answer that won't be wrong next week.
Official Google Gemini API pricing table (April 2026)
Google currently offers six Gemini models through their API, organized into three performance tiers. Here's the full pricing table:
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Context Window | Category |
|---|---|---|---|---|
| Gemini 3 Pro | $2.00* | $12.00* | 1,000,000 | Flagship |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1,000,000 | Flagship |
| Gemini 3 Flash | $0.50 | $3.00 | 1,000,000 | Efficient |
| Gemini 2.5 Flash | $0.15 | $0.60 | 1,000,000 | Efficient |
| Gemini 2.0 Flash | $0.10 | $0.40 | 1,000,000 | Efficient |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 1,000,000 | Budget |
| Gemini 2.0 Flash-Lite | $0.075 | $0.30 | 1,000,000 | Budget |
*Gemini 3 Pro pricing is tiered: $2.00/$12.00 for prompts ≤200K tokens, increasing to $4.00/$18.00 for prompts above 200K tokens.
💡 Key Takeaway: Every single Gemini model supports a 1 million token context window. No other provider offers million-token context across their entire lineup — Anthropic's models top out at 200K. OpenAI's GPT-5.2 matches at 1M, but most of their other models cap at 128K.
The pricing spans a massive range. Gemini 3 Pro's output tokens cost 40x more than Gemini 2.0 Flash-Lite's. That gap represents fundamentally different use cases, and picking the wrong tier can blow your budget or bottleneck your application's quality.
Tier 1: Gemini 3 Pro and 2.5 Pro — flagship performance
Gemini 3 Pro ($2.00 / $12.00)
Gemini 3 Pro is Google's current best model, competing directly with GPT-5.2 and Claude Opus 4.6. At $2.00 input / $12.00 output per million tokens (for prompts up to 200K tokens), it sits in a competitive spot. Note: for prompts exceeding 200K tokens, pricing jumps to $4.00/$18.00 — important to factor in if you're leveraging the full 1M context window.
How does it stack up against the other flagships?
| Model | Input | Output | Effective cost (1K-token task)* |
|---|---|---|---|
| Gemini 3 Pro | $2.00 | $12.00 | $0.0032 |
| GPT-5.2 | $1.75 | $14.00 | $0.0034 |
| Claude Opus 4.6 | $5.00 | $25.00 | $0.0075 |
| Grok 4 | $3.00 | $15.00 | $0.0045 |
*Assumes 500 input tokens + 200 output tokens per task.
Gemini 3 Pro is 57% cheaper than Claude Opus 4.6 for equivalent tasks and marginally cheaper than GPT-5.2. For teams currently running Opus-class workloads, switching to Gemini 3 Pro can cut costs in half with competitive quality.
Gemini 2.5 Pro ($1.25 / $10.00)
The previous-generation flagship remains available and offers a 37% discount on input tokens compared to Gemini 3 Pro. For workloads where you don't need the absolute latest capabilities, 2.5 Pro is a strong value play at $1.25 input / $10.00 output.
It matches GPT-5 and GPT-5.1 on pricing almost exactly, which makes it a direct alternative for cost-conscious teams evaluating multi-provider strategies.
📊 Quick Math: Switching from Gemini 3 Pro to 2.5 Pro saves $0.75 per million input tokens and $2.00 per million output tokens. On a workload processing 100M tokens/month, that's $275/month saved — over $3,000/year.
When to use the Pro tier
- Complex reasoning tasks that require state-of-the-art intelligence
- Long-document analysis leveraging the full 1M context window
- Code generation for production-grade applications
- Multimodal tasks combining text, images, and structured data
- RAG pipelines where answer quality directly impacts user experience
Tier 2: Gemini Flash models — the efficiency sweet spot
The Flash tier is where Gemini's pricing story gets genuinely exciting. Google offers three Flash variants, and the newest one — Gemini 3 Flash — delivers remarkable capability at a fraction of flagship pricing.
Gemini 3 Flash ($0.50 / $3.00)
At $0.50 input / $3.00 output, Gemini 3 Flash occupies a unique position in the market. It's priced below most competitors' mid-tier models while delivering performance that many developers find sufficient for production workloads.
Compare it to other efficient-tier models:
| Model | Input | Output | Quality tier |
|---|---|---|---|
| Gemini 3 Flash | $0.50 | $3.00 | High-efficient |
| Claude Haiku 4.5 | $1.00 | $5.00 | Mid-efficient |
| GPT-4.1 mini | $0.40 | $1.60 | Mid-efficient |
| GPT-5 mini | $0.25 | $2.00 | Mid-efficient |
| Mistral Large 3 | $0.50 | $1.50 | Mid-tier |
Gemini 3 Flash's output tokens are pricier than GPT-4.1 mini or GPT-5 mini, but Google's benchmarks place Flash's quality closer to older flagship models. For teams that need better-than-mini quality without paying flagship prices, Flash fills the gap.
⚠️ Warning: Don't compare Flash models purely on price per token. Gemini 3 Flash consistently produces higher-quality outputs than models at similar price points, which means fewer retries and less post-processing. Factor in your actual completion rate, not just token cost.
Gemini 2.5 Flash ($0.15 / $0.60)
This is the budget champion of the mid-tier. At $0.15 input / $0.60 output, Gemini 2.5 Flash matches GPT-4o mini on pricing while offering a 1M token context window (versus GPT-4o mini's 128K).
[stat] $0.60/M Gemini 2.5 Flash output pricing — matching GPT-4o mini while offering 8x the context window
For high-volume applications like classification, extraction, summarization, and routing, Gemini 2.5 Flash is one of the most cost-effective options available from a major provider.
Gemini 2.0 Flash ($0.10 / $0.40)
The oldest Flash model in the current lineup, priced at $0.10 input / $0.40 output. It's slightly cheaper than 2.5 Flash and still capable for straightforward tasks. Unless you specifically need 2.5's improved reasoning, 2.0 Flash saves an incremental 33% on input tokens.
When to use the Flash tier
- Customer-facing chatbots where response quality matters but flagship pricing is overkill
- Summarization pipelines processing hundreds of documents daily
- Data extraction from structured and semi-structured content
- Classification and routing in multi-model architectures
- Prototype development before committing to Pro-tier costs
Tier 3: Flash-Lite — maximum savings
Google's Flash-Lite models target the absolute lowest cost tier, competing with DeepSeek V3.2 and open-source alternatives.
Gemini 2.5 Flash-Lite ($0.10 / $0.40)
At $0.10 input / $0.40 output, Gemini 2.5 Flash-Lite matches Gemini 2.0 Flash on price. The tradeoff is reduced capability — Lite models sacrifice some quality for consistent low latency and minimal cost.
Gemini 2.0 Flash-Lite ($0.075 / $0.30)
The cheapest model in Google's lineup. At $0.075 input / $0.30 output, it competes directly with:
| Model | Input | Output | Provider |
|---|---|---|---|
| Gemini 2.0 Flash-Lite | $0.075 | $0.30 | |
| GPT-5 nano | $0.05 | $0.40 | OpenAI |
| Mistral Small 3.2 | $0.06 | $0.18 | Mistral |
| DeepSeek V3.2 | $0.28 | $0.42 | DeepSeek |
| Llama 3.1 8B | $0.18 | $0.18 | Meta/Together |
💡 Key Takeaway: Google's cheapest model undercuts DeepSeek V3.2 on input price by over 70%, though DeepSeek offers stronger reasoning at its price point. For simple tasks at massive scale, Flash-Lite wins on cost.
When to use Flash-Lite
- Pre-processing and filtering before sending data to more expensive models
- Batch classification at scale (millions of items)
- Simple Q&A over well-structured knowledge bases
- Logging and monitoring pipelines that need lightweight AI decisions
- Cost-sensitive MVPs testing market fit before investing in quality
Real-world cost scenarios
Let's run the numbers for three common use cases to see how Gemini models compare across the lineup.
Scenario 1: Customer support chatbot (10,000 conversations/day)
Assumptions: Average 800 input tokens, 400 output tokens per conversation. 300,000 conversations/month.
| Model | Monthly input cost | Monthly output cost | Total/month |
|---|---|---|---|
| Gemini 3 Pro | $480 | $1,440 | $1,920 |
| Gemini 3 Flash | $120 | $360 | $480 |
| Gemini 2.5 Flash | $36 | $72 | $108 |
| Gemini 2.0 Flash-Lite | $18 | $36 | $54 |
[stat] $1,866/month The cost difference between Gemini 3 Pro and 2.0 Flash-Lite for the same chatbot workload
That's a 35x cost difference between the top and bottom of Google's own lineup. For most customer support applications, Gemini 2.5 Flash at $108/month offers the best quality-to-cost ratio.
Scenario 2: Document processing pipeline (50,000 pages/month)
Assumptions: 2,000 input tokens per page (document content), 500 output tokens (extracted data). 50,000 documents/month.
| Model | Monthly cost |
|---|---|
| Gemini 3 Pro | $500 |
| Gemini 3 Flash | $125 |
| Gemini 2.5 Flash | $30 |
| Gemini 2.0 Flash-Lite | $18.75 |
For document extraction where accuracy is critical, Gemini 3 Flash at $125/month is the sweet spot. For simpler extraction tasks, 2.5 Flash at $30/month is remarkably cheap.
Scenario 3: AI-powered search (1M queries/month)
Assumptions: 300 input tokens (query + context), 200 output tokens (answer). 1,000,000 queries/month.
| Model | Monthly cost |
|---|---|
| Gemini 3 Pro | $3,000 |
| Gemini 3 Flash | $750 |
| Gemini 2.5 Flash | $165 |
| GPT-5 mini (comparison) | $475 |
| Claude Haiku 4.5 (comparison) | $1,300 |
📊 Quick Math: At 1M queries/month, Gemini 2.5 Flash costs just $165 — that's $0.000165 per query. Compare that to Claude Haiku 4.5 at $1,300 or GPT-5 mini at $475. For search workloads, Gemini's Flash tier is hard to beat and ranks near the top in cost-per-million token comparisons.
Gemini's real advantage: cheap experimentation, then clean paid scaling
Google's free tier is still genuinely useful, but the better framing is this: Gemini gives you a low-friction path from testing in AI Studio to paid production usage without changing providers.
Use the free tier when you want to:
- Prototype prompts and workflows in AI Studio
- Test whether Gemini quality is good enough before linking billing
- Build low-volume internal tools
- Run small experiments without committing production budget
Move to paid when you need:
- Higher rate limits
- Context caching for repeated prompts or large reference context
- Batch API pricing with a 50% cost reduction on async workloads
- Access to Google's most advanced models
- A cleaner data policy, since Google says paid-tier content is not used to improve its products
✅ TL;DR: Start free if you're validating an idea. Move to paid the second throughput, caching, or batch discounts matter. That's where the real cost advantage shows up.
How to optimize your Gemini API costs
1. Use model routing
Don't send every request to the same model. Build a simple router that sends complex queries to Gemini 3 Pro and straightforward ones to 2.5 Flash or Flash-Lite. A typical distribution might be 10% Pro / 60% Flash / 30% Flash-Lite, cutting your effective cost by 60-70% compared to running everything on Pro.
2. Leverage the context window
Gemini's 1M token context window means you can stuff more relevant context into a single call instead of making multiple calls or running RAG retrieval. Fewer calls = fewer output tokens = lower cost. This is especially powerful for document analysis where you can process an entire document in one pass.
3. Use context caching
Google offers context caching for Gemini models, which stores frequently-used context (like system prompts or reference documents) server-side. Cached tokens are billed at a 75% discount on input pricing. For applications with large, repeated system prompts, this alone can slash your input costs; the same principle is covered in our prompt caching cost-savings guide.
4. Batch where possible
For non-real-time workloads, batch your requests. Google's batch API processes requests asynchronously at lower priority with discounted pricing. If your workload can tolerate minutes of latency instead of seconds, batching is free cost reduction.
⚠️ Warning: Context caching has a minimum size requirement and storage costs. For short prompts or highly variable inputs, caching may cost more than it saves. Calculate your actual cache hit rate before committing to a caching strategy.
Gemini vs the competition: where Google wins (and loses)
Where Gemini wins on price
- Mid-tier efficiency: Gemini 2.5 Flash at $0.15/$0.60 is unmatched for quality-per-dollar in the mid-tier
- Context window value: 1M tokens across all models — no upcharge for long context
- Free tier: Unmatched for prototyping and low-volume use
- Flash-Lite floor: $0.075 input is among the cheapest from any major provider
Where Gemini loses
- Ultra-cheap reasoning: DeepSeek V3.2 at $0.28/$0.42 offers stronger reasoning than Flash-Lite at similar prices
- Premium output quality: Claude Opus 4.6 still leads on nuanced writing and analysis, justifying its higher cost for quality-critical applications
- Code generation: GPT-5.2 and Claude Sonnet 4.6 maintain edges in code quality, though the gap is narrowing
- Open-source alternative: Llama 4 Maverick at $0.27/$0.85 runs on multiple providers, avoiding vendor lock-in
✅ TL;DR: Google wins on breadth of options and mid-tier pricing. They lose on the extremes — the very cheapest reasoning (DeepSeek) and the very best quality (Claude Opus). For the 80% of workloads in the middle, Gemini's price-to-performance ratio is exceptional.
Frequently asked questions
How much does the Gemini API cost per token in 2026?
Google Gemini API pricing ranges from $0.075 per million input tokens (Gemini 2.0 Flash-Lite) to $2.00 per million input tokens (Gemini 3 Pro). Output tokens range from $0.30 to $12.00 per million, with Gemini 3 Pro increasing to $18.00 output when prompts exceed 200K tokens. Use our calculator to estimate costs for your workload.
Does Gemini API have a free tier in 2026?
Yes. Google's free Gemini Developer API tier is available through an active project or free trial and includes Google AI Studio access, free tokens, and limited access to certain models. The catch is lower rate limits, and Google says free-tier content may be used to improve its products.
What are Gemini API free-tier rate limits?
Google no longer presents one universal static limit table that stays true forever. Instead, it says Gemini rate limits vary by model and usage tier, are applied per project rather than per API key, and should be checked in AI Studio for your live RPM, TPM, and RPD values. If you need more headroom, moving to paid usage tiers is the fix.
Does Gemini API charge monthly subscription fees?
No. Standard Gemini Developer API access is usage-based, not subscription-based. There is no flat monthly fee or required monthly minimum just to keep the API available. If you enable billing, you pay for the tokens you use and get access to higher usage tiers.
What is the difference between Google AI Studio free access and paid Gemini API?
Free access is for testing. Paid access is for production. Paid adds higher rate limits, context caching, Batch API pricing at 50% lower cost for async jobs, access to Google's most advanced models, and a promise that content is not used to improve Google's products.
Which Gemini model is the best value?
Gemini 2.5 Flash offers the best overall value at $0.15/$0.60 per million tokens. It delivers mid-tier quality with near-budget pricing and includes the full 1M token context window. For most production applications that don't require flagship intelligence, it's the optimal choice. See our best budget AI models guide for more options.
Is Gemini cheaper than ChatGPT (OpenAI)?
It depends on the tier. Gemini 3 Pro ($2.00/$12.00) is slightly cheaper than GPT-5.2 ($1.75/$14.00) on output but pricier on input. In the mid-tier, Gemini 2.5 Flash ($0.15/$0.60) matches GPT-4o mini ($0.15/$0.60) exactly. At the budget tier, Gemini 2.0 Flash-Lite ($0.075/$0.30) is slightly more expensive than GPT-5 nano ($0.05/$0.40) on input but cheaper on output. Check our OpenAI vs Anthropic pricing comparison for the full picture.
How does Gemini's context window compare to competitors?
Every Gemini model supports 1,000,000 tokens of context. This is the largest context window available from any major provider across an entire model lineup. Claude Opus 4.6 supports 200K tokens, GPT-5.2 supports 1M tokens but older OpenAI models are limited to 128K. For long-document processing, Gemini's consistent 1M context is a major advantage. Learn more about how tokens affect pricing.
Bottom line: which Gemini model should you use?
Here's the decision tree:
- Need the best quality Google offers → Gemini 3 Pro ($2.00/$12.00)
- Want flagship quality at a discount → Gemini 2.5 Pro ($1.25/$10.00)
- Production chatbot or summarization → Gemini 3 Flash ($0.50/$3.00)
- High-volume, cost-sensitive production → Gemini 2.5 Flash ($0.15/$0.60)
- Maximum cost savings, simple tasks → Gemini 2.0 Flash-Lite ($0.075/$0.30)
For most teams, start with Gemini 2.5 Flash and upgrade to 3 Flash or 3 Pro only where quality demands it. This two-tier approach keeps costs low while maintaining quality where it matters.
Ready to calculate your exact costs? Try our AI API cost calculator — plug in your expected token usage and compare Gemini against every major provider instantly.
