Published February 20, 2026Updated March 21, 2026

How Much Does One AI API Request Actually Cost? Real Math for Every Model

Stop guessing. We calculate the exact cost per request for GPT-5, Claude, Gemini, and more using typical workload sizes so you can budget accurately.

pricingtutorialcost-optimization

How Much Does One AI API Request Actually Cost? Real Math for Every Model

"How much will this cost?" is the first question every developer asks before integrating an AI API. The answer seems simple — multiply tokens by price — but the real-world cost depends on your request size, output length, and which model you pick. The difference between choosing wisely and choosing blindly can be 20x or more per request.

This guide calculates the exact cost per request for every major model across three common workload sizes. No hand-waving, no "it depends" — just hard numbers you can plug into your budget spreadsheet.

We'll also show you how costs compound at scale, because a $0.04 request looks cheap until you multiply it by 50,000.

[stat] 20× The cost gap between the cheapest and most expensive model for the same request

The formula

Every AI API charges based on tokens processed. The formula is straightforward:

Cost per request = (input tokens × input price per token) + (output tokens × output price per token)

Prices are quoted per million tokens, so you divide by 1,000,000. For example, a request sending 1,000 input tokens to GPT-5 at $1.25/million:

Input cost: 1,000 ÷ 1,000,000 × $1.25 = $0.00125
Add the output cost using the same method
Sum them for the total per-request cost

Simple in theory. The challenge is that different models have wildly different pricing, and output tokens typically cost 2–8× more than input tokens. Let's see how this plays out across real scenarios.

Small request: chatbot reply (800 in / 300 out)

This is your typical conversational exchange — a user message with some conversation history as context, and a short reply back. This is the bread-and-butter workload for customer support bots, FAQ assistants, and chat interfaces.

Model	Input cost	Output cost	Cost per request	100K req/mo
GPT-5 nano	$0.00004	$0.00012	$0.00016	$16
Gemini 2.5 Flash-Lite	$0.00008	$0.00012	$0.00020	$20
Mistral Small 3.2	$0.00005	$0.00005	$0.00010	$10
DeepSeek V3.2	$0.00022	$0.00013	$0.00035	$35
GPT-5 mini	$0.00020	$0.00060	$0.00080	$80
Gemini 3 Flash	$0.00040	$0.00090	$0.00130	$130
GPT-5	$0.00100	$0.00300	$0.00400	$400
Claude Sonnet 4.6	$0.00240	$0.00450	$0.00690	$690
Grok 4	$0.00240	$0.00450	$0.00690	$690
Claude Opus 4.6	$0.00400	$0.00750	$0.01150	$1,150

💡 Key Takeaway: For simple chatbot replies, budget models like Mistral Small 3.2 and GPT-5 nano cost under $0.001 per request. Premium models cost 10-70× more for the same task. Match model capability to task complexity — don't use a $0.01 model for a $0.0001 job.

The gap is staggering. Mistral Small 3.2 at $10/month versus Claude Opus 4.6 at $1,150/month for 100K identical chatbot requests. That's a 115× cost difference. For simple conversational tasks, the budget models deliver perfectly adequate quality.

Medium request: summarization (3,000 in / 1,000 out)

Summarizing articles, emails, reports, or documents. More input context and longer outputs make the model choice more impactful.

Model	Input cost	Output cost	Cost per request	50K req/mo
Mistral Small 3.2	$0.00018	$0.00018	$0.00036	$18
DeepSeek V3.2	$0.00084	$0.00042	$0.00126	$63
GPT-5 mini	$0.00075	$0.00200	$0.00275	$138
Llama 4 Maverick	$0.00081	$0.00085	$0.00166	$83
Mistral Large 3	$0.00150	$0.00150	$0.00300	$150
GPT-5	$0.00375	$0.01000	$0.01375	$688
GPT-5.2	$0.00525	$0.01400	$0.01925	$963
Claude Sonnet 4.6	$0.00900	$0.01500	$0.02400	$1,200
Gemini 3 Pro	$0.00600	$0.01200	$0.01800	$900
Claude Opus 4.6	$0.01500	$0.02500	$0.04000	$2,000

📊 Quick Math: Summarizing 50K documents per month costs $18 with Mistral Small 3.2 versus $2,000 with Claude Opus 4.6. That's a $23,784/year difference. Make sure you actually need a flagship model before defaulting to one.

Notice how output costs start dominating at this size. Claude Sonnet 4.6's output cost ($0.015) is 1.67× its input cost ($0.009), even though there are 3× more input tokens. This is because output tokens cost 5× more per token on Claude models.

$0.00036

Mistral Small 3.2 per request

$0.04000

Claude Opus 4.6 per request

Large request: code generation (5,000 in / 3,000 out)

Generating functions, refactoring code, multi-step reasoning, or complex analysis. These output-heavy requests are where pricing differences hit hardest.

Model	Input cost	Output cost	Cost per request	10K req/mo
DeepSeek V3.2	$0.00140	$0.00126	$0.00266	$27
Mistral Small 3.2	$0.00030	$0.00054	$0.00084	$8
GPT-5 mini	$0.00125	$0.00600	$0.00725	$73
Llama 4 Maverick	$0.00135	$0.00255	$0.00390	$39
Mistral Large 3	$0.00250	$0.00450	$0.00700	$70
GPT-5	$0.00625	$0.03000	$0.03625	$363
GPT-5.2	$0.00875	$0.04200	$0.05075	$508
Claude Sonnet 4.6	$0.01500	$0.04500	$0.06000	$600
Grok 4	$0.01500	$0.04500	$0.06000	$600
Claude Opus 4.6	$0.02500	$0.07500	$0.10000	$1,000

For code generation, the output side absolutely dominates. Claude Opus 4.6's output cost ($0.075) is 3× its input cost ($0.025) despite only 60% more output tokens. This is the output multiplier effect in action.

⚠️ Warning: Output-heavy workloads like code generation amplify the pricing gap. Claude Opus 4.6 costs $0.10 per request — 37× more than DeepSeek V3.2 at $0.0027. Unless you've verified the quality difference justifies this premium for your specific codebase, you're overspending.

The output multiplier effect

Most providers charge 2–8× more for output tokens than input tokens. This matters because it means the ratio of input to output in your workload dramatically affects your total cost.

Model	Input / 1M	Output / 1M	Output multiplier
DeepSeek V3.2	$0.28	$0.42	1.5×
Mistral Large 3	$0.50	$1.50	3×
GPT-5 mini	$0.25	$2.00	8×
GPT-5.2	$1.75	$14.00	8×
Claude Opus 4.6	$5.00	$25.00	5×
Gemini 2.5 Pro	$1.25	$10.00	8×

Models with lower output multipliers (DeepSeek at 1.5×, Mistral Large at 3×) are disproportionately cheaper for output-heavy workloads. If your application generates long responses — detailed explanations, full code files, long-form content — prioritize models with low output multipliers.

💡 Key Takeaway: For output-heavy workloads, the output multiplier matters more than the base input price. DeepSeek V3.2's 1.5× multiplier makes it dramatically cheaper than models with 8× multipliers, even if their input price is similar.

Scaling: how costs compound at volume

A single request cost looks trivial. The monthly bill doesn't. Here's how a $0.02 per-request cost scales:

Daily requests	Monthly requests	Cost at $0.02/req	Cost at $0.002/req
1,000	30,000	$600	$60
5,000	150,000	$3,000	$300
10,000	300,000	$6,000	$600
50,000	1,500,000	$30,000	$3,000

[stat] $324,000/year The annual difference between $0.02 and $0.002 per request at 50K requests/day

That 10× per-request savings turns into $27,000/month at 50K daily requests. This is why model selection is the single highest-leverage cost decision you'll make.

And these are just the base token costs. Add hidden costs like retries, failed requests, and context waste, and the real difference grows further.

How to pick the right model for your budget

Follow this decision framework:

Step 1: Determine your request profile. Measure actual input and output token counts from your application. Don't guess — instrument your code to log token usage for a representative sample.

Step 2: Calculate cost per request for 3-5 candidate models. Use the formula above or our calculator to get exact numbers.

Step 3: Estimate monthly volume. Include growth projections. A workload that starts at 10K requests/day often hits 50K within months.

Step 4: Test quality on your actual data. Run 100-500 production-like requests through each candidate model. The cheapest model that meets your quality bar wins.

Step 5: Consider a tiered approach. Route simple requests to budget models and complex ones to premium models. A classifier that costs $0.0001 per request can save thousands by keeping 80% of traffic on cheap models.

For a detailed guide on optimization strategies, read our guide to reducing AI API costs.

Per-request costs for reasoning models

Reasoning models (o3, o4-mini, DeepSeek R1) work differently — they generate internal "thinking" tokens that inflate output counts. A request that produces 500 visible output tokens might consume 3,000-10,000 total output tokens including reasoning.

Model	Input / 1M	Output / 1M	Visible output	Effective cost (5K in / 500 visible out)
o3	$2.00	$8.00	+ hidden reasoning	$0.05-$0.15
o4-mini	$1.10	$4.40	+ hidden reasoning	$0.02-$0.08
DeepSeek R1 V3.2	$0.28	$0.42	+ hidden reasoning	$0.003-$0.01

⚠️ Warning: Reasoning model costs are unpredictable because thinking token counts vary by problem complexity. Budget 3-5× more than the visible output suggests. Monitor actual token usage closely during your first week of production deployment.

Frequently asked questions

How do I calculate the cost of a single AI API request?

Use the formula: (input tokens × input price per million ÷ 1,000,000) + (output tokens × output price per million ÷ 1,000,000). For example, a GPT-5 request with 2,000 input and 1,000 output tokens costs (2,000 × $1.25 / 1M) + (1,000 × $10.00 / 1M) = $0.0025 + $0.01 = $0.0125. Use our calculator to automate this for any model.

What is the cheapest AI model per request?

For simple tasks, Mistral Small 3.2 at $0.06/$0.18 per million tokens and GPT-5 nano at $0.05/$0.40 are the cheapest options, costing under $0.001 per typical request. For tasks needing more capability, DeepSeek V3.2 at $0.28/$0.42 offers strong performance at budget pricing. See our full budget model comparison.

Why is my actual API bill higher than my per-request estimate?

Several hidden costs inflate your real spend: failed requests that still bill for input tokens, automatic retries that multiply costs, context window waste from oversized prompts, and tool-calling overhead. Most teams should add 30-50% to their per-request estimates for accurate budgeting.

How much does GPT-5 cost per request compared to Claude?

For a typical medium request (3,000 in / 1,000 out): GPT-5 costs $0.014, Claude Sonnet 4.6 costs $0.024, and Claude Opus 4.6 costs $0.040. GPT-5 is 42% cheaper than Claude Sonnet and 65% cheaper than Claude Opus per request. However, Claude Sonnet 4.6 offers a 1M context window and computer-use capabilities that GPT-5 lacks.

Should I use a cheap model or an expensive model?

Match the model to the task complexity. Use budget models (GPT-5 nano, Mistral Small, DeepSeek V3.2) for classification, simple chat, and routine summarization. Use mid-tier models (GPT-5, Llama 4 Maverick, Mistral Large 3) for standard workloads. Reserve premium models (Claude Opus 4.6, GPT-5.2 pro) for tasks where quality directly impacts revenue. A tiered routing strategy cuts costs by 40-60%.

Related Cost Guides

Keep going with the closest pricing and optimization guides in this cluster.

How Much Does One AI API Request Actually Cost? Real Math for Every Model

The formula

Small request: chatbot reply (800 in / 300 out)

Medium request: summarization (3,000 in / 1,000 out)

Large request: code generation (5,000 in / 3,000 out)

The output multiplier effect

Scaling: how costs compound at volume

How to pick the right model for your budget

Per-request costs for reasoning models

Frequently asked questions

How do I calculate the cost of a single AI API request?

What is the cheapest AI model per request?

Why is my actual API bill higher than my per-request estimate?

How much does GPT-5 cost per request compared to Claude?

Should I use a cheap model or an expensive model?

Related Cost Guides

Cheapest AI Model for Every Task: April 2026 Buyer's Guide

How Many AI Tokens Can You Get for $1? Every Major Model Compared

AI Reasoning Model Pricing: What Thinking Tokens Actually Cost You