Published February 24, 2026Updated April 20, 2026

Google Gemini API Pricing Guide 2026: Official Per-Token Costs, Free Tier, and Rate Limits

Official Google Gemini API pricing per token in 2026, from Flash-Lite to Gemini 3 Pro. See Gemini free tier limits, AI Studio usage tiers, batch discounts, and monthly fee details.

geminigooglepricingapi-costsguide

Google Gemini API Pricing Guide 2026: Official Per-Token Costs, Free Tier, and Rate Limits

If you searched for Google Gemini API pricing in 2026, here's the official answer first.

Updated April 2026: the biggest point of confusion now is not just Gemini pricing per token, but how Google's free tier, AI Studio, usage-tier rate limits, and usage-based billing fit together.

Official Google Gemini API pricing per token: quick answer (2026)

Pricing range: Official Google Gemini API pricing runs from $0.075/$0.30 (2.0 Flash-Lite input/output per 1M tokens) up to $2.00/$12.00 on Gemini 3 Pro, with a higher $4.00/$18.00 tier when 3 Pro prompts exceed 200K tokens.
Free tier: Google's free tier is tied to an active project or free trial and includes Google AI Studio access, free tokens, and limited access to certain Gemini models.
Monthly fees: There is no flat Gemini API subscription fee. Free usage is free, and paid usage is still billed per token once billing is enabled.
Paid tiers: Moving to paid unlocks higher rate limits, context caching, Batch API pricing with a 50% cost reduction, and access to Google's most advanced models.
Rate limits: Google tracks RPM, TPM, and RPD, applies those limits per project rather than per API key, and now tells developers to check AI Studio for their live limits because they vary by model and usage tier.

This guide breaks down each Gemini tier, compares Flash vs Pro costs, and shows workload math so you can pick the cheapest model that still meets your quality target. If you also want cross-provider benchmarks, use our AI API pricing guide.

Are there Gemini API subscription fees or monthly minimums?

No. Google does not charge a mandatory monthly Gemini API subscription fee just to keep the API enabled.

If you add billing, you move from the free tier into paid usage tiers, but billing stays usage-based. In plain English:

Free tier: no monthly fee, limited access, lower rate limits
Paid tier: no seat fee or flat subscription, but you pay per token and get higher limits
Enterprise / Vertex AI: may involve committed spend or enterprise contracts, but that is a different buying motion from the standard Gemini Developer API

That matters because a lot of searches use phrases like "Gemini API subscription cost" or "Gemini API monthly fees". The clean answer is: standard Gemini API pricing is pay-as-you-go, not a monthly subscription.

Gemini free tier vs paid Gemini API: what changed in 2026

Most searchers asking about "Gemini API free tier" are really mixing together three separate things: free access in AI Studio, paid Gemini API access, and the usage tiers that control rate limits.

Option	What you get	Best for	Important catch
Free	Limited access to certain models, free input/output tokens, Google AI Studio access	Testing, prototypes, tiny internal tools	Lower rate limits, and Google says free-tier content may be used to improve its products
Paid	Higher rate limits, context caching, Batch API with 50% cost reduction, access to Google's most advanced models	Production apps	Requires billing
Enterprise (Vertex AI)	Dedicated support, security/compliance options, provisioned throughput, volume discounts	Large deployments	Sales process, more setup

That distinction matters because a lot of ranking pages answer the wrong question. They quote a single static free-tier limit and call it a day. That ages badly.

Gemini API free tier limits and rate limits: the non-confusing version

Here is what Google documents right now:

Rate limits are per project, not per API key.
Google measures requests per minute (RPM), tokens per minute (TPM), and requests per day (RPD).
Exact active limits now depend on model + usage tier, and Google points developers to AI Studio to see the live numbers for their own project.
Priority inference defaults to 0.3x the standard rate limit for each model and tier.
Batch API has separate limits, including 100 concurrent batch requests, a 2GB input file limit, and 20GB file storage.

Gemini usage tiers that affect rate limits

Usage tier	Qualification	Monthly spend ceiling
Free	Active project or free trial	N/A
Tier 1	Set up and link an active billing account	$250
Tier 2	Paid $100 + 3 days from first successful payment	$2,000
Tier 3	Paid $1,000 + 30 days from first successful payment	$20,000 to $100,000+

So if you want the exact answer to "What are Gemini API free-tier rate limits?", the honest answer is: check AI Studio for your live project limits, then use the usage tier table above to understand how Google decides them. That's less clickbaity, but it's the answer that won't be wrong next week.

Official Google Gemini API pricing table (April 2026)

Google currently offers six Gemini models through their API, organized into three performance tiers. Here's the full pricing table:

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context Window	Category
Gemini 3 Pro	$2.00*	$12.00*	1,000,000	Flagship
Gemini 2.5 Pro	$1.25	$10.00	1,000,000	Flagship
Gemini 3 Flash	$0.50	$3.00	1,000,000	Efficient
Gemini 2.5 Flash	$0.15	$0.60	1,000,000	Efficient
Gemini 2.0 Flash	$0.10	$0.40	1,000,000	Efficient
Gemini 2.5 Flash-Lite	$0.10	$0.40	1,000,000	Budget
Gemini 2.0 Flash-Lite	$0.075	$0.30	1,000,000	Budget

*Gemini 3 Pro pricing is tiered: $2.00/$12.00 for prompts ≤200K tokens, increasing to $4.00/$18.00 for prompts above 200K tokens.

💡 Key Takeaway: Every single Gemini model supports a 1 million token context window. No other provider offers million-token context across their entire lineup — Anthropic's models top out at 200K. OpenAI's GPT-5.2 matches at 1M, but most of their other models cap at 128K.

The pricing spans a massive range. Gemini 3 Pro's output tokens cost 40x more than Gemini 2.0 Flash-Lite's. That gap represents fundamentally different use cases, and picking the wrong tier can blow your budget or bottleneck your application's quality.

Tier 1: Gemini 3 Pro and 2.5 Pro — flagship performance

Gemini 3 Pro ($2.00 / $12.00)

Gemini 3 Pro is Google's current best model, competing directly with GPT-5.2 and Claude Opus 4.6. At $2.00 input / $12.00 output per million tokens (for prompts up to 200K tokens), it sits in a competitive spot. Note: for prompts exceeding 200K tokens, pricing jumps to $4.00/$18.00 — important to factor in if you're leveraging the full 1M context window.

How does it stack up against the other flagships?

Model	Input	Output	Effective cost (1K-token task)*
Gemini 3 Pro	$2.00	$12.00	$0.0032
GPT-5.2	$1.75	$14.00	$0.0034
Claude Opus 4.6	$5.00	$25.00	$0.0075
Grok 4	$3.00	$15.00	$0.0045

*Assumes 500 input tokens + 200 output tokens per task.

$0.0032

Gemini 3 Pro per task

$0.0075

Claude Opus 4.6 per task

Gemini 3 Pro is 57% cheaper than Claude Opus 4.6 for equivalent tasks and marginally cheaper than GPT-5.2. For teams currently running Opus-class workloads, switching to Gemini 3 Pro can cut costs in half with competitive quality.

Gemini 2.5 Pro ($1.25 / $10.00)

The previous-generation flagship remains available and offers a 37% discount on input tokens compared to Gemini 3 Pro. For workloads where you don't need the absolute latest capabilities, 2.5 Pro is a strong value play at $1.25 input / $10.00 output.

It matches GPT-5 and GPT-5.1 on pricing almost exactly, which makes it a direct alternative for cost-conscious teams evaluating multi-provider strategies.

📊 Quick Math: Switching from Gemini 3 Pro to 2.5 Pro saves $0.75 per million input tokens and $2.00 per million output tokens. On a workload processing 100M tokens/month, that's $275/month saved — over $3,000/year.

When to use the Pro tier

Complex reasoning tasks that require state-of-the-art intelligence
Long-document analysis leveraging the full 1M context window
Code generation for production-grade applications
Multimodal tasks combining text, images, and structured data
RAG pipelines where answer quality directly impacts user experience

Tier 2: Gemini Flash models — the efficiency sweet spot

The Flash tier is where Gemini's pricing story gets genuinely exciting. Google offers three Flash variants, and the newest one — Gemini 3 Flash — delivers remarkable capability at a fraction of flagship pricing.

Gemini 3 Flash ($0.50 / $3.00)

At $0.50 input / $3.00 output, Gemini 3 Flash occupies a unique position in the market. It's priced below most competitors' mid-tier models while delivering performance that many developers find sufficient for production workloads.

Compare it to other efficient-tier models:

Model	Input	Output	Quality tier
Gemini 3 Flash	$0.50	$3.00	High-efficient
Claude Haiku 4.5	$1.00	$5.00	Mid-efficient
GPT-4.1 mini	$0.40	$1.60	Mid-efficient
GPT-5 mini	$0.25	$2.00	Mid-efficient
Mistral Large 3	$0.50	$1.50	Mid-tier

Gemini 3 Flash's output tokens are pricier than GPT-4.1 mini or GPT-5 mini, but Google's benchmarks place Flash's quality closer to older flagship models. For teams that need better-than-mini quality without paying flagship prices, Flash fills the gap.

⚠️ Warning: Don't compare Flash models purely on price per token. Gemini 3 Flash consistently produces higher-quality outputs than models at similar price points, which means fewer retries and less post-processing. Factor in your actual completion rate, not just token cost.

Gemini 2.5 Flash ($0.15 / $0.60)

This is the budget champion of the mid-tier. At $0.15 input / $0.60 output, Gemini 2.5 Flash matches GPT-4o mini on pricing while offering a 1M token context window (versus GPT-4o mini's 128K).

[stat] $0.60/M Gemini 2.5 Flash output pricing — matching GPT-4o mini while offering 8x the context window

For high-volume applications like classification, extraction, summarization, and routing, Gemini 2.5 Flash is one of the most cost-effective options available from a major provider.

Gemini 2.0 Flash ($0.10 / $0.40)

The oldest Flash model in the current lineup, priced at $0.10 input / $0.40 output. It's slightly cheaper than 2.5 Flash and still capable for straightforward tasks. Unless you specifically need 2.5's improved reasoning, 2.0 Flash saves an incremental 33% on input tokens.

When to use the Flash tier

Customer-facing chatbots where response quality matters but flagship pricing is overkill
Summarization pipelines processing hundreds of documents daily
Data extraction from structured and semi-structured content
Classification and routing in multi-model architectures
Prototype development before committing to Pro-tier costs

Tier 3: Flash-Lite — maximum savings

Google's Flash-Lite models target the absolute lowest cost tier, competing with DeepSeek V3.2 and open-source alternatives.

Gemini 2.5 Flash-Lite ($0.10 / $0.40)

At $0.10 input / $0.40 output, Gemini 2.5 Flash-Lite matches Gemini 2.0 Flash on price. The tradeoff is reduced capability — Lite models sacrifice some quality for consistent low latency and minimal cost.

Gemini 2.0 Flash-Lite ($0.075 / $0.30)

The cheapest model in Google's lineup. At $0.075 input / $0.30 output, it competes directly with:

Model	Input	Output	Provider
Gemini 2.0 Flash-Lite	$0.075	$0.30	Google
GPT-5 nano	$0.05	$0.40	OpenAI
Mistral Small 3.2	$0.06	$0.18	Mistral
DeepSeek V3.2	$0.28	$0.42	DeepSeek
Llama 3.1 8B	$0.18	$0.18	Meta/Together

💡 Key Takeaway: Google's cheapest model undercuts DeepSeek V3.2 on input price by over 70%, though DeepSeek offers stronger reasoning at its price point. For simple tasks at massive scale, Flash-Lite wins on cost.

When to use Flash-Lite

Pre-processing and filtering before sending data to more expensive models
Batch classification at scale (millions of items)
Simple Q&A over well-structured knowledge bases
Logging and monitoring pipelines that need lightweight AI decisions
Cost-sensitive MVPs testing market fit before investing in quality

Real-world cost scenarios

Let's run the numbers for three common use cases to see how Gemini models compare across the lineup.

Scenario 1: Customer support chatbot (10,000 conversations/day)

Assumptions: Average 800 input tokens, 400 output tokens per conversation. 300,000 conversations/month.

Model	Monthly input cost	Monthly output cost	Total/month
Gemini 3 Pro	$480	$1,440	$1,920
Gemini 3 Flash	$120	$360	$480
Gemini 2.5 Flash	$36	$72	$108
Gemini 2.0 Flash-Lite	$18	$36	$54

[stat] $1,866/month The cost difference between Gemini 3 Pro and 2.0 Flash-Lite for the same chatbot workload

That's a 35x cost difference between the top and bottom of Google's own lineup. For most customer support applications, Gemini 2.5 Flash at $108/month offers the best quality-to-cost ratio.

Scenario 2: Document processing pipeline (50,000 pages/month)

Assumptions: 2,000 input tokens per page (document content), 500 output tokens (extracted data). 50,000 documents/month.

Model	Monthly cost
Gemini 3 Pro	$500
Gemini 3 Flash	$125
Gemini 2.5 Flash	$30
Gemini 2.0 Flash-Lite	$18.75

For document extraction where accuracy is critical, Gemini 3 Flash at $125/month is the sweet spot. For simpler extraction tasks, 2.5 Flash at $30/month is remarkably cheap.

Scenario 3: AI-powered search (1M queries/month)

Assumptions: 300 input tokens (query + context), 200 output tokens (answer). 1,000,000 queries/month.

Model	Monthly cost
Gemini 3 Pro	$3,000
Gemini 3 Flash	$750
Gemini 2.5 Flash	$165
GPT-5 mini (comparison)	$475
Claude Haiku 4.5 (comparison)	$1,300

📊 Quick Math: At 1M queries/month, Gemini 2.5 Flash costs just $165 — that's $0.000165 per query. Compare that to Claude Haiku 4.5 at $1,300 or GPT-5 mini at $475. For search workloads, Gemini's Flash tier is hard to beat and ranks near the top in cost-per-million token comparisons.

Gemini's real advantage: cheap experimentation, then clean paid scaling

Google's free tier is still genuinely useful, but the better framing is this: Gemini gives you a low-friction path from testing in AI Studio to paid production usage without changing providers.

Use the free tier when you want to:

Prototype prompts and workflows in AI Studio
Test whether Gemini quality is good enough before linking billing
Build low-volume internal tools
Run small experiments without committing production budget

Move to paid when you need:

Higher rate limits
Context caching for repeated prompts or large reference context
Batch API pricing with a 50% cost reduction on async workloads
Access to Google's most advanced models
A cleaner data policy, since Google says paid-tier content is not used to improve its products

✅ TL;DR: Start free if you're validating an idea. Move to paid the second throughput, caching, or batch discounts matter. That's where the real cost advantage shows up.

How to optimize your Gemini API costs

1. Use model routing

Don't send every request to the same model. Build a simple router that sends complex queries to Gemini 3 Pro and straightforward ones to 2.5 Flash or Flash-Lite. A typical distribution might be 10% Pro / 60% Flash / 30% Flash-Lite, cutting your effective cost by 60-70% compared to running everything on Pro.

2. Leverage the context window

Gemini's 1M token context window means you can stuff more relevant context into a single call instead of making multiple calls or running RAG retrieval. Fewer calls = fewer output tokens = lower cost. This is especially powerful for document analysis where you can process an entire document in one pass.

3. Use context caching

Google offers context caching for Gemini models, which stores frequently-used context (like system prompts or reference documents) server-side. Cached tokens are billed at a 75% discount on input pricing. For applications with large, repeated system prompts, this alone can slash your input costs; the same principle is covered in our prompt caching cost-savings guide.

4. Batch where possible

For non-real-time workloads, batch your requests. Google's batch API processes requests asynchronously at lower priority with discounted pricing. If your workload can tolerate minutes of latency instead of seconds, batching is free cost reduction.

⚠️ Warning: Context caching has a minimum size requirement and storage costs. For short prompts or highly variable inputs, caching may cost more than it saves. Calculate your actual cache hit rate before committing to a caching strategy.

Gemini vs the competition: where Google wins (and loses)

Where Gemini wins on price

Mid-tier efficiency: Gemini 2.5 Flash at $0.15/$0.60 is unmatched for quality-per-dollar in the mid-tier
Context window value: 1M tokens across all models — no upcharge for long context
Free tier: Unmatched for prototyping and low-volume use
Flash-Lite floor: $0.075 input is among the cheapest from any major provider

Where Gemini loses

Ultra-cheap reasoning: DeepSeek V3.2 at $0.28/$0.42 offers stronger reasoning than Flash-Lite at similar prices
Premium output quality: Claude Opus 4.6 still leads on nuanced writing and analysis, justifying its higher cost for quality-critical applications
Code generation: GPT-5.2 and Claude Sonnet 4.6 maintain edges in code quality, though the gap is narrowing
Open-source alternative: Llama 4 Maverick at $0.27/$0.85 runs on multiple providers, avoiding vendor lock-in

✅ TL;DR: Google wins on breadth of options and mid-tier pricing. They lose on the extremes — the very cheapest reasoning (DeepSeek) and the very best quality (Claude Opus). For the 80% of workloads in the middle, Gemini's price-to-performance ratio is exceptional.

Frequently asked questions

How much does the Gemini API cost per token in 2026?

Google Gemini API pricing ranges from $0.075 per million input tokens (Gemini 2.0 Flash-Lite) to $2.00 per million input tokens (Gemini 3 Pro). Output tokens range from $0.30 to $12.00 per million, with Gemini 3 Pro increasing to $18.00 output when prompts exceed 200K tokens. Use our calculator to estimate costs for your workload.

Does Gemini API have a free tier in 2026?

Yes. Google's free Gemini Developer API tier is available through an active project or free trial and includes Google AI Studio access, free tokens, and limited access to certain models. The catch is lower rate limits, and Google says free-tier content may be used to improve its products.

What are Gemini API free-tier rate limits?

Google no longer presents one universal static limit table that stays true forever. Instead, it says Gemini rate limits vary by model and usage tier, are applied per project rather than per API key, and should be checked in AI Studio for your live RPM, TPM, and RPD values. If you need more headroom, moving to paid usage tiers is the fix.

Does Gemini API charge monthly subscription fees?

No. Standard Gemini Developer API access is usage-based, not subscription-based. There is no flat monthly fee or required monthly minimum just to keep the API available. If you enable billing, you pay for the tokens you use and get access to higher usage tiers.

What is the difference between Google AI Studio free access and paid Gemini API?

Free access is for testing. Paid access is for production. Paid adds higher rate limits, context caching, Batch API pricing at 50% lower cost for async jobs, access to Google's most advanced models, and a promise that content is not used to improve Google's products.

Which Gemini model is the best value?

Gemini 2.5 Flash offers the best overall value at $0.15/$0.60 per million tokens. It delivers mid-tier quality with near-budget pricing and includes the full 1M token context window. For most production applications that don't require flagship intelligence, it's the optimal choice. See our best budget AI models guide for more options.

Is Gemini cheaper than ChatGPT (OpenAI)?

It depends on the tier. Gemini 3 Pro ($2.00/$12.00) is slightly cheaper than GPT-5.2 ($1.75/$14.00) on output but pricier on input. In the mid-tier, Gemini 2.5 Flash ($0.15/$0.60) matches GPT-4o mini ($0.15/$0.60) exactly. At the budget tier, Gemini 2.0 Flash-Lite ($0.075/$0.30) is slightly more expensive than GPT-5 nano ($0.05/$0.40) on input but cheaper on output. Check our OpenAI vs Anthropic pricing comparison for the full picture.

How does Gemini's context window compare to competitors?

Every Gemini model supports 1,000,000 tokens of context. This is the largest context window available from any major provider across an entire model lineup. Claude Opus 4.6 supports 200K tokens, GPT-5.2 supports 1M tokens but older OpenAI models are limited to 128K. For long-document processing, Gemini's consistent 1M context is a major advantage. Learn more about how tokens affect pricing.

Bottom line: which Gemini model should you use?

Here's the decision tree:

Need the best quality Google offers → Gemini 3 Pro ($2.00/$12.00)
Want flagship quality at a discount → Gemini 2.5 Pro ($1.25/$10.00)
Production chatbot or summarization → Gemini 3 Flash ($0.50/$3.00)
High-volume, cost-sensitive production → Gemini 2.5 Flash ($0.15/$0.60)
Maximum cost savings, simple tasks → Gemini 2.0 Flash-Lite ($0.075/$0.30)

For most teams, start with Gemini 2.5 Flash and upgrade to 3 Flash or 3 Pro only where quality demands it. This two-tier approach keeps costs low while maintaining quality where it matters.

Ready to calculate your exact costs? Try our AI API cost calculator — plug in your expected token usage and compare Gemini against every major provider instantly.

Related Cost Guides

Keep going with the closest pricing and optimization guides in this cluster.

Google Gemini API Pricing Guide 2026: Official Per-Token Costs, Free Tier, and Rate Limits

Official Google Gemini API pricing per token: quick answer (2026)

Are there Gemini API subscription fees or monthly minimums?

Gemini free tier vs paid Gemini API: what changed in 2026

Gemini API free tier limits and rate limits: the non-confusing version

Gemini usage tiers that affect rate limits

Official Google Gemini API pricing table (April 2026)

Tier 1: Gemini 3 Pro and 2.5 Pro — flagship performance

Gemini 3 Pro ($2.00 / $12.00)

Gemini 2.5 Pro ($1.25 / $10.00)

When to use the Pro tier

Tier 2: Gemini Flash models — the efficiency sweet spot

Gemini 3 Flash ($0.50 / $3.00)

Gemini 2.5 Flash ($0.15 / $0.60)

Gemini 2.0 Flash ($0.10 / $0.40)

When to use the Flash tier

Tier 3: Flash-Lite — maximum savings

Gemini 2.5 Flash-Lite ($0.10 / $0.40)

Gemini 2.0 Flash-Lite ($0.075 / $0.30)

When to use Flash-Lite

Real-world cost scenarios

Scenario 1: Customer support chatbot (10,000 conversations/day)

Scenario 2: Document processing pipeline (50,000 pages/month)

Scenario 3: AI-powered search (1M queries/month)

Gemini's real advantage: cheap experimentation, then clean paid scaling

How to optimize your Gemini API costs

1. Use model routing

2. Leverage the context window

3. Use context caching

4. Batch where possible

Gemini vs the competition: where Google wins (and loses)

Where Gemini wins on price

Where Gemini loses

Frequently asked questions

How much does the Gemini API cost per token in 2026?

Does Gemini API have a free tier in 2026?

What are Gemini API free-tier rate limits?

Does Gemini API charge monthly subscription fees?

What is the difference between Google AI Studio free access and paid Gemini API?

Which Gemini model is the best value?

Is Gemini cheaper than ChatGPT (OpenAI)?

How does Gemini's context window compare to competitors?

Bottom line: which Gemini model should you use?

Related Cost Guides

Gemini 3.1 Pro: Double the Reasoning, Same Price

Google Gemma 4 Pricing 2026: Self-Hosting Cost vs API Cost

Cheapest AI Model for Every Task: April 2026 Buyer's Guide