"How much will this cost?" is the first question every developer asks before integrating an AI API. The answer seems simple — multiply tokens by price — but the real-world cost depends on your request size, output length, and which model you pick. The difference between choosing wisely and choosing blindly can be 20x or more per request.
This guide calculates the exact cost per request for every major model across three common workload sizes. No hand-waving, no "it depends" — just hard numbers you can plug into your budget spreadsheet.
We'll also show you how costs compound at scale, because a $0.04 request looks cheap until you multiply it by 50,000.
[stat] 20× The cost gap between the cheapest and most expensive model for the same request
The formula
Every AI API charges based on tokens processed. The formula is straightforward:
Cost per request = (input tokens × input price per token) + (output tokens × output price per token)
Prices are quoted per million tokens, so you divide by 1,000,000. For example, a request sending 1,000 input tokens to GPT-5 at $1.25/million:
- Input cost: 1,000 ÷ 1,000,000 × $1.25 = $0.00125
- Add the output cost using the same method
- Sum them for the total per-request cost
Simple in theory. The challenge is that different models have wildly different pricing, and output tokens typically cost 2–8× more than input tokens. Let's see how this plays out across real scenarios.
Small request: chatbot reply (800 in / 300 out)
This is your typical conversational exchange — a user message with some conversation history as context, and a short reply back. This is the bread-and-butter workload for customer support bots, FAQ assistants, and chat interfaces.
| Model | Input cost | Output cost | Cost per request | 100K req/mo |
|---|---|---|---|---|
| GPT-5 nano | $0.00004 | $0.00012 | $0.00016 | $16 |
| Gemini 2.5 Flash-Lite | $0.00008 | $0.00012 | $0.00020 | $20 |
| Mistral Small 3.2 | $0.00005 | $0.00005 | $0.00010 | $10 |
| DeepSeek V3.2 | $0.00022 | $0.00013 | $0.00035 | $35 |
| GPT-5 mini | $0.00020 | $0.00060 | $0.00080 | $80 |
| Gemini 3 Flash | $0.00040 | $0.00090 | $0.00130 | $130 |
| GPT-5 | $0.00100 | $0.00300 | $0.00400 | $400 |
| Claude Sonnet 4.6 | $0.00240 | $0.00450 | $0.00690 | $690 |
| Grok 4 | $0.00240 | $0.00450 | $0.00690 | $690 |
| Claude Opus 4.6 | $0.00400 | $0.00750 | $0.01150 | $1,150 |
💡 Key Takeaway: For simple chatbot replies, budget models like Mistral Small 3.2 and GPT-5 nano cost under $0.001 per request. Premium models cost 10-70× more for the same task. Match model capability to task complexity — don't use a $0.01 model for a $0.0001 job.
The gap is staggering. Mistral Small 3.2 at $10/month versus Claude Opus 4.6 at $1,150/month for 100K identical chatbot requests. That's a 115× cost difference. For simple conversational tasks, the budget models deliver perfectly adequate quality.
Medium request: summarization (3,000 in / 1,000 out)
Summarizing articles, emails, reports, or documents. More input context and longer outputs make the model choice more impactful.
| Model | Input cost | Output cost | Cost per request | 50K req/mo |
|---|---|---|---|---|
| Mistral Small 3.2 | $0.00018 | $0.00018 | $0.00036 | $18 |
| DeepSeek V3.2 | $0.00084 | $0.00042 | $0.00126 | $63 |
| GPT-5 mini | $0.00075 | $0.00200 | $0.00275 | $138 |
| Llama 4 Maverick | $0.00081 | $0.00085 | $0.00166 | $83 |
| Mistral Large 3 | $0.00150 | $0.00150 | $0.00300 | $150 |
| GPT-5 | $0.00375 | $0.01000 | $0.01375 | $688 |
| GPT-5.2 | $0.00525 | $0.01400 | $0.01925 | $963 |
| Claude Sonnet 4.6 | $0.00900 | $0.01500 | $0.02400 | $1,200 |
| Gemini 3 Pro | $0.00600 | $0.01200 | $0.01800 | $900 |
| Claude Opus 4.6 | $0.01500 | $0.02500 | $0.04000 | $2,000 |
📊 Quick Math: Summarizing 50K documents per month costs $18 with Mistral Small 3.2 versus $2,000 with Claude Opus 4.6. That's a $23,784/year difference. Make sure you actually need a flagship model before defaulting to one.
Notice how output costs start dominating at this size. Claude Sonnet 4.6's output cost ($0.015) is 1.67× its input cost ($0.009), even though there are 3× more input tokens. This is because output tokens cost 5× more per token on Claude models.
Large request: code generation (5,000 in / 3,000 out)
Generating functions, refactoring code, multi-step reasoning, or complex analysis. These output-heavy requests are where pricing differences hit hardest.
| Model | Input cost | Output cost | Cost per request | 10K req/mo |
|---|---|---|---|---|
| DeepSeek V3.2 | $0.00140 | $0.00126 | $0.00266 | $27 |
| Mistral Small 3.2 | $0.00030 | $0.00054 | $0.00084 | $8 |
| GPT-5 mini | $0.00125 | $0.00600 | $0.00725 | $73 |
| Llama 4 Maverick | $0.00135 | $0.00255 | $0.00390 | $39 |
| Mistral Large 3 | $0.00250 | $0.00450 | $0.00700 | $70 |
| GPT-5 | $0.00625 | $0.03000 | $0.03625 | $363 |
| GPT-5.2 | $0.00875 | $0.04200 | $0.05075 | $508 |
| Claude Sonnet 4.6 | $0.01500 | $0.04500 | $0.06000 | $600 |
| Grok 4 | $0.01500 | $0.04500 | $0.06000 | $600 |
| Claude Opus 4.6 | $0.02500 | $0.07500 | $0.10000 | $1,000 |
For code generation, the output side absolutely dominates. Claude Opus 4.6's output cost ($0.075) is 3× its input cost ($0.025) despite only 60% more output tokens. This is the output multiplier effect in action.
⚠️ Warning: Output-heavy workloads like code generation amplify the pricing gap. Claude Opus 4.6 costs $0.10 per request — 37× more than DeepSeek V3.2 at $0.0027. Unless you've verified the quality difference justifies this premium for your specific codebase, you're overspending.
The output multiplier effect
Most providers charge 2–8× more for output tokens than input tokens. This matters because it means the ratio of input to output in your workload dramatically affects your total cost.
| Model | Input / 1M | Output / 1M | Output multiplier |
|---|---|---|---|
| DeepSeek V3.2 | $0.28 | $0.42 | 1.5× |
| Mistral Large 3 | $0.50 | $1.50 | 3× |
| GPT-5 mini | $0.25 | $2.00 | 8× |
| GPT-5.2 | $1.75 | $14.00 | 8× |
| Claude Opus 4.6 | $5.00 | $25.00 | 5× |
| Gemini 2.5 Pro | $1.25 | $10.00 | 8× |
Models with lower output multipliers (DeepSeek at 1.5×, Mistral Large at 3×) are disproportionately cheaper for output-heavy workloads. If your application generates long responses — detailed explanations, full code files, long-form content — prioritize models with low output multipliers.
💡 Key Takeaway: For output-heavy workloads, the output multiplier matters more than the base input price. DeepSeek V3.2's 1.5× multiplier makes it dramatically cheaper than models with 8× multipliers, even if their input price is similar.
Scaling: how costs compound at volume
A single request cost looks trivial. The monthly bill doesn't. Here's how a $0.02 per-request cost scales:
| Daily requests | Monthly requests | Cost at $0.02/req | Cost at $0.002/req |
|---|---|---|---|
| 1,000 | 30,000 | $600 | $60 |
| 5,000 | 150,000 | $3,000 | $300 |
| 10,000 | 300,000 | $6,000 | $600 |
| 50,000 | 1,500,000 | $30,000 | $3,000 |
[stat] $324,000/year The annual difference between $0.02 and $0.002 per request at 50K requests/day
That 10× per-request savings turns into $27,000/month at 50K daily requests. This is why model selection is the single highest-leverage cost decision you'll make.
And these are just the base token costs. Add hidden costs like retries, failed requests, and context waste, and the real difference grows further.
How to pick the right model for your budget
Follow this decision framework:
Step 1: Determine your request profile. Measure actual input and output token counts from your application. Don't guess — instrument your code to log token usage for a representative sample.
Step 2: Calculate cost per request for 3-5 candidate models. Use the formula above or our calculator to get exact numbers.
Step 3: Estimate monthly volume. Include growth projections. A workload that starts at 10K requests/day often hits 50K within months.
Step 4: Test quality on your actual data. Run 100-500 production-like requests through each candidate model. The cheapest model that meets your quality bar wins.
Step 5: Consider a tiered approach. Route simple requests to budget models and complex ones to premium models. A classifier that costs $0.0001 per request can save thousands by keeping 80% of traffic on cheap models.
For a detailed guide on optimization strategies, read our guide to reducing AI API costs.
Per-request costs for reasoning models
Reasoning models (o3, o4-mini, DeepSeek R1) work differently — they generate internal "thinking" tokens that inflate output counts. A request that produces 500 visible output tokens might consume 3,000-10,000 total output tokens including reasoning.
| Model | Input / 1M | Output / 1M | Visible output | Effective cost (5K in / 500 visible out) |
|---|---|---|---|---|
| o3 | $2.00 | $8.00 | + hidden reasoning | $0.05-$0.15 |
| o4-mini | $1.10 | $4.40 | + hidden reasoning | $0.02-$0.08 |
| DeepSeek R1 V3.2 | $0.28 | $0.42 | + hidden reasoning | $0.003-$0.01 |
⚠️ Warning: Reasoning model costs are unpredictable because thinking token counts vary by problem complexity. Budget 3-5× more than the visible output suggests. Monitor actual token usage closely during your first week of production deployment.
Frequently asked questions
How do I calculate the cost of a single AI API request?
Use the formula: (input tokens × input price per million ÷ 1,000,000) + (output tokens × output price per million ÷ 1,000,000). For example, a GPT-5 request with 2,000 input and 1,000 output tokens costs (2,000 × $1.25 / 1M) + (1,000 × $10.00 / 1M) = $0.0025 + $0.01 = $0.0125. Use our calculator to automate this for any model.
What is the cheapest AI model per request?
For simple tasks, Mistral Small 3.2 at $0.06/$0.18 per million tokens and GPT-5 nano at $0.05/$0.40 are the cheapest options, costing under $0.001 per typical request. For tasks needing more capability, DeepSeek V3.2 at $0.28/$0.42 offers strong performance at budget pricing. See our full budget model comparison.
Why is my actual API bill higher than my per-request estimate?
Several hidden costs inflate your real spend: failed requests that still bill for input tokens, automatic retries that multiply costs, context window waste from oversized prompts, and tool-calling overhead. Most teams should add 30-50% to their per-request estimates for accurate budgeting.
How much does GPT-5 cost per request compared to Claude?
For a typical medium request (3,000 in / 1,000 out): GPT-5 costs $0.014, Claude Sonnet 4.6 costs $0.024, and Claude Opus 4.6 costs $0.040. GPT-5 is 42% cheaper than Claude Sonnet and 65% cheaper than Claude Opus per request. However, Claude Sonnet 4.6 offers a 1M context window and computer-use capabilities that GPT-5 lacks.
Should I use a cheap model or an expensive model?
Match the model to the task complexity. Use budget models (GPT-5 nano, Mistral Small, DeepSeek V3.2) for classification, simple chat, and routine summarization. Use mid-tier models (GPT-5, Llama 4 Maverick, Mistral Large 3) for standard workloads. Reserve premium models (Claude Opus 4.6, GPT-5.2 pro) for tasks where quality directly impacts revenue. A tiered routing strategy cuts costs by 40-60%.
