What are thinking tokens in AI reasoning models?

Thinking tokens are the internal chain-of-thought reasoning steps generated by models like o3, o4-mini, and DeepSeek R1 before producing a visible answer. They are billed as output tokens but never appear in the response — making the actual cost per request significantly higher than the listed token price suggests.

How much do thinking tokens cost?

Thinking tokens are billed at the same rate as standard output tokens. For o3 at $8 per million output tokens, each thinking token costs $0.000008. Since reasoning models typically generate 1,000–10,000 thinking tokens per request, a single complex o3 query can cost $0.008–$0.08 in thinking tokens alone before you see any response.

Which reasoning model has the lowest thinking token overhead?

DeepSeek R1 at $2.19 per million output tokens has the lowest cost per thinking token among major reasoning models. o4-mini ($4.40/M output) is cheaper than o3 ($8/M) and typically generates fewer thinking tokens for most tasks. For budget-conscious reasoning workloads, o4-mini or DeepSeek R1 are the best starting points.

When is paying for reasoning models worth it?

Reasoning models are worth the premium for math problems, code debugging, multi-step logic, and tasks with clear right/wrong answers. For creative writing, summarization, or simple Q&A, a flagship or balanced model delivers equivalent quality at 5–20× lower cost. Only escalate to reasoning models when accuracy on complex tasks directly affects outcomes.

Can I control how many thinking tokens a model uses?

Some providers let you set a thinking budget. Anthropic's Claude allows configuring extended thinking token limits. OpenAI's o-series models use a reasoning_effort parameter (low/medium/high) that influences thinking depth and cost. Starting with lower effort settings and increasing only when quality is insufficient is the recommended cost optimization strategy.

Published February 23, 2026Updated March 21, 2026

AI Reasoning Model Pricing: What Thinking Tokens Actually Cost You

Reasoning models like o3, o4-mini, and DeepSeek R1 generate hidden thinking tokens that inflate your bill. We break down the real costs with examples — and show when paying the premium actually makes sense.

pricingreasoningcost-optimizationcomparison

AI Reasoning Model Pricing: What Thinking Tokens Actually Cost You

Reasoning models are the most powerful — and most deceptive — AI models available in 2026. The sticker price per million tokens only tells half the story. The real cost driver is thinking tokens: the internal chain-of-thought that reasoning models generate before producing your visible answer. These tokens are billed as output but never appear in the response.

Here's everything you need to know about reasoning model pricing across providers, thinking token overhead, and when the premium is actually justified.

[stat] 5–14× How much more expensive a single reasoning model request can be compared to the same request on a standard model, due to thinking token overhead

What are thinking tokens?

When you send a prompt to a reasoning model like OpenAI's o3 or o4-mini, the model doesn't jump straight to an answer. It first generates an internal chain of reasoning — sometimes hundreds or thousands of tokens — working through the problem step by step.

These thinking tokens are generated as output tokens, which means they're billed at the (higher) output token rate. You don't see them in the response, but they show up on your bill.

The volume of thinking tokens depends entirely on the problem's complexity:

Simple question (factual lookup, basic classification): 200–500 thinking tokens
Moderate reasoning (code generation, multi-step analysis): 2,000–5,000 thinking tokens
Complex problem (mathematical proofs, architectural design, constraint satisfaction): 5,000–20,000 thinking tokens
Extremely hard problem (competition math, novel algorithm design): 20,000–50,000+ thinking tokens

This unpredictability is what makes reasoning model budgeting difficult. A request that looks identical to a simple query might trigger deep reasoning and consume 10× more output tokens than expected.

⚠️ Warning: Thinking tokens are invisible in the response but fully visible on your invoice. A 500-token visible response can cost the same as a 10,000-token response if the model generated 9,500 thinking tokens internally. Always check actual token usage in your API response metadata.

Reasoning model pricing at a glance

Here's every reasoning model currently available, with per-million-token pricing:

Model	Provider	Input (per 1M)	Output (per 1M)	Context Window	Notes
GPT-5.2 pro	OpenAI	$21.00	$168.00	1M	Most expensive; highest capability
o3-pro	OpenAI	$20.00	$80.00	1M	Premium reasoning
o1	OpenAI	$15.00	$60.00	200K	Original reasoning model
Grok 4	xAI	$3.00	$15.00	256K	Vision + reasoning
Magistral Medium	Mistral	$2.00	$5.00	128K	Transparent reasoning
o3	OpenAI	$2.00	$8.00	1M	Advanced reasoning
o4-mini	OpenAI	$1.10	$4.40	2M	Efficient reasoning, huge context
o3-mini	OpenAI	$1.10	$4.40	500K	Previous-gen efficient reasoning
o1-mini	OpenAI	$1.10	$4.40	128K	Original compact reasoning
Magistral Small	Mistral	$0.50	$1.50	128K	Budget reasoning
DeepSeek R1 V3.2	DeepSeek	$0.28	$0.42	128K	Cheapest reasoning model

For comparison, here are the equivalent non-reasoning models:

Model	Provider	Input (per 1M)	Output (per 1M)
GPT-5.2	OpenAI	$1.75	$14.00
GPT-5	OpenAI	$1.25	$10.00
Claude Sonnet 4.6	Anthropic	$3.00	$15.00
GPT-5 mini	OpenAI	$0.25	$2.00
DeepSeek V3.2	DeepSeek	$0.28	$0.42

At sticker price, o3 ($2/$8) and GPT-5 ($1.25/$10) look similarly priced. But that comparison is misleading — o3 generates thinking tokens on top of your visible output, making the effective cost per request 3–14× higher.

$0.28/$0.42

DeepSeek R1 V3.2 per 1M tokens

$2.00/$8.00

o3 per 1M tokens

The real cost: thinking token multiplier in action

Let's work through concrete examples. You ask a coding question with a 1,000-token prompt and expect a 500-token visible answer.

With GPT-5 (no thinking tokens):

Input: 1,000 tokens × $1.25/1M = $0.00125
Output: 500 tokens × $10.00/1M = $0.005
Total: $0.00625 per request

With o3 (moderate reasoning — ~3,000 thinking tokens):

Input: 1,000 tokens × $2.00/1M = $0.002
Output: 3,500 tokens (500 visible + 3,000 thinking) × $8.00/1M = $0.028
Total: $0.030 per request — 4.8× more expensive than GPT-5

With o3 (heavy reasoning — ~10,000 thinking tokens):

Input: 1,000 tokens × $2.00/1M = $0.002
Output: 10,500 tokens × $8.00/1M = $0.084
Total: $0.086 per request — 13.8× more expensive

With o3-pro (heavy reasoning — ~10,000 thinking tokens):

Input: 1,000 tokens × $20.00/1M = $0.02
Output: 10,500 tokens × $80.00/1M = $0.84
Total: $0.86 per request — 138× more expensive than GPT-5

📊 Quick Math: A single complex reasoning request on o3-pro can cost $0.86 — more than many developers spend on AI in an entire day. Before using premium reasoning models, verify that the accuracy improvement justifies a 50–140× cost increase over standard models.

Monthly cost comparison: production workloads

For a production workload of 10,000 requests per day (typical for a SaaS backend), here's what you'd spend monthly at different reasoning intensities:

Model	Avg Thinking Tokens	Cost/Request	Monthly Cost
DeepSeek V3.2 (standard)	0	$0.00027	$81
GPT-5 mini	0	$0.00125	$375
GPT-5	0	$0.00625	$1,875
DeepSeek R1 V3.2	~2,000	$0.00133	$399
o4-mini	~2,000	$0.01210	$3,630
Magistral Small	~2,000	$0.00425	$1,275
o3	~3,000	$0.03000	$9,000
Magistral Medium	~3,000	$0.01700	$5,100
Grok 4	~3,000	$0.04800	$14,400
o3-pro	~5,000	$0.46000	$138,000
GPT-5.2 pro	~5,000	$0.94500	$283,500

DeepSeek R1 V3.2 stands out as remarkably cost-effective for a reasoning model. At $0.28/$0.42 per million tokens, even with 2,000 thinking tokens per request, it costs just $399/month — comparable to GPT-5 mini without reasoning. It's the only reasoning model that can compete on price with standard models in most budget model rankings.

💡 Key Takeaway: DeepSeek R1 V3.2 is the budget reasoning powerhouse. At $399/month for 10K daily requests with moderate reasoning, it costs 96% less than o3 ($9,000) and 99.7% less than o3-pro ($138,000) for the same workload. If you need reasoning on a budget, start here.

When reasoning models are worth the premium

Reasoning models aren't just "better GPT." They excel at specific tasks where step-by-step logical thinking produces measurably better results.

Worth the premium (accuracy improvements justify cost):

Complex code generation and debugging — reasoning catches edge cases, handles multi-file dependencies, and produces more correct code on the first attempt
Multi-step mathematical reasoning — standard models often fail at 3+ step problems where reasoning models maintain accuracy
Logic puzzles and constraint satisfaction — scheduling, optimization, and rule-based problems
Scientific analysis requiring careful deduction and evidence evaluation
Legal and medical reasoning where errors have real consequences
Agentic workflows where the model needs to plan and execute multi-step tasks

Not worth the premium (standard models perform equally well):

Simple Q&A or chatbot conversations
Text summarization (reasoning overhead adds cost without improving quality)
Translation (language tasks don't benefit from chain-of-thought)
Content generation (creative writing, marketing copy)
Classification tasks (labels don't need reasoning)
Data extraction and formatting

The accuracy test: If your accuracy on a task improves from 70% to 95% with a reasoning model, and errors cost you money (wrong code, bad analysis, incorrect recommendations), the 5–14× price increase easily pays for itself. If accuracy only improves from 90% to 92%, the premium rarely justifies the cost.

Five strategies to control reasoning model costs

1. Use reasoning effort settings

OpenAI's o-series models support a reasoning_effort parameter with three levels: low, medium, and high. Lower effort = fewer thinking tokens = lower cost.

Effort Level	Typical Thinking Tokens	Relative Cost
Low	500–1,000	1× (baseline)
Medium	2,000–5,000	3–5×
High	5,000–20,000	10–20×

For many tasks, medium gives 80% of high's quality at 40% of the thinking token cost. Start with medium and only escalate to high for problems that demonstrably benefit.

2. Route by complexity

Don't send every request to a reasoning model. Use a cheap model (GPT-5 nano at $0.05/$0.40 or Mistral Small 3.2 at $0.06/$0.18) as a router to classify request difficulty. Only escalate complex requests to reasoning models.

Typical distribution for a coding assistant:

60% simple requests → GPT-5 mini ($0.25/$2.00)
30% moderate → DeepSeek R1 V3.2 ($0.28/$0.42)
10% complex → o3 ($2.00/$8.00)

This routing approach cuts reasoning model costs by 70–90% compared to sending everything to o3. Read our cost optimization guide for implementation details.

3. Set max completion tokens

Cap your output tokens to prevent runaway thinking. If a task should take 500 tokens to answer, setting max_completion_tokens to 5,000 prevents the model from spending 50,000 tokens reasoning about edge cases.

This is especially important for o3-pro and GPT-5.2 pro, where uncapped thinking on a complex problem can generate $1+ per request. Hard limits protect your budget.

4. Consider DeepSeek R1 V3.2

At $0.28/$0.42 per million tokens, DeepSeek R1 V3.2 offers chain-of-thought reasoning at standard-model prices. For many use cases — code generation, math, logic problems — it delivers reasoning capability at a fraction of o3's cost. The tradeoff: smaller context window (128K vs o3's 1M) and less polish on edge cases.

5. Monitor thinking token usage

Track actual thinking token counts per request type. OpenAI's API returns thinking token counts in the usage metadata. Log this data and analyze it weekly:

Are certain prompt patterns triggering excessive thinking?
Can you rephrase prompts to reduce reasoning depth?
Are there request types where thinking tokens add no measurable quality?

Use this data to continuously refine your routing rules and effort settings.

⚠️ Warning: Reasoning model costs are inherently unpredictable because thinking token volume varies by problem difficulty. Always set hard spending caps with your provider and max_completion_tokens on every request. A single runaway request on o3-pro can cost more than your entire daily budget.

Reasoning model comparison: cost efficiency ranking

For a standardized workload (1,000 input tokens, 500 visible output tokens, 3,000 thinking tokens), here's how every reasoning model compares on cost per request:

Model	Cost/Request	Relative Cost
DeepSeek R1 V3.2	$0.0018	1× (baseline)
Magistral Small	$0.0073	4.1×
o4-mini	$0.0165	9.2×
o3-mini	$0.0165	9.2×
Magistral Medium	$0.0195	10.8×
o3	$0.0300	16.7×
Grok 4	$0.0480	26.7×
o1	$0.2250	125×
o3-pro	$0.3000	167×
GPT-5.2 pro	$0.6090	338×

DeepSeek R1 is 338× cheaper than GPT-5.2 pro per reasoning request. Even compared to o3 — the most commonly used production reasoning model — it's 16.7× cheaper, which mirrors what we see in our DeepSeek vs GPT-5 mini breakdown.

The bottom line

Reasoning models are powerful but expensive — not because of their sticker price, but because of the hidden thinking token overhead. Before choosing a reasoning model:

Estimate your thinking token ratio. Test with real prompts and check actual token usage in the API response.
Compare total cost, not just per-token price. A "cheaper" reasoning model that generates more thinking tokens can cost more than an "expensive" one, especially when tokens-per-dollar efficiency differs by output mix.
Route smartly. Use reasoning models only where they add measurable value. Send everything else to standard models.
Start with DeepSeek R1. At $0.28/$0.42, it's the cheapest way to access reasoning capabilities. Escalate to o3 or o4-mini only when DeepSeek R1 falls short.
Track your spend. Monitor thinking token usage weekly — it varies by prompt and can creep up.

Use our calculator to estimate monthly costs with thinking token overhead, or check the comparison pages to see how reasoning models stack up against standard models for your specific use case.

Frequently asked questions

Do all reasoning models charge for thinking tokens?

Yes — thinking tokens are billed as output tokens across all providers. The impact varies enormously by pricing: DeepSeek R1 charges $0.42/M for thinking tokens (barely noticeable), while o3-pro charges $80/M (budget-breaking). Always calculate your effective cost including estimated thinking token volume.

Can I see the thinking tokens in the API response?

OpenAI's API returns the thinking token count in the usage metadata (completion_tokens_details.reasoning_tokens), but not the actual content. You can see how many tokens were used for reasoning versus the visible response. This data is essential for cost monitoring and optimization.

How many thinking tokens does a typical request use?

It varies enormously by problem complexity. Simple tasks: 500–2,000 thinking tokens. Moderate reasoning: 2,000–5,000. Complex problems: 5,000–20,000+. Math competition problems can generate 50,000+ thinking tokens for a single response. The unpredictability is why spending caps and monitoring are non-negotiable.

Is o4-mini better than o3-mini?

Yes. o4-mini is the successor to o3-mini at the same price point ($1.10/$4.40). It offers improved reasoning capability and a larger 2M context window (versus o3-mini's 500K). There's no reason to use o3-mini for new projects — o4-mini is strictly better at the same price.

When should I use DeepSeek R1 versus o3?

Start with DeepSeek R1 V3.2 for all reasoning tasks. It costs 16.7× less than o3 per request. Only escalate to o3 when: (1) you need >128K context, (2) DeepSeek R1's accuracy measurably falls short on your specific task, or (3) you need OpenAI's ecosystem features (function calling format, specific API guarantees). For most coding and reasoning tasks, DeepSeek R1 delivers comparable quality at a fraction of the cost.

Related Cost Guides

Keep going with the closest pricing and optimization guides in this cluster.

AI Reasoning Model Pricing: What Thinking Tokens Actually Cost You

What are thinking tokens?

Reasoning model pricing at a glance

The real cost: thinking token multiplier in action

With GPT-5 (no thinking tokens):

With o3 (moderate reasoning — ~3,000 thinking tokens):

With o3 (heavy reasoning — ~10,000 thinking tokens):

With o3-pro (heavy reasoning — ~10,000 thinking tokens):

Monthly cost comparison: production workloads

When reasoning models are worth the premium

Five strategies to control reasoning model costs

1. Use reasoning effort settings

2. Route by complexity

3. Set max completion tokens

4. Consider DeepSeek R1 V3.2

5. Monitor thinking token usage

Reasoning model comparison: cost efficiency ranking

The bottom line

Frequently asked questions

Do all reasoning models charge for thinking tokens?

Can I see the thinking tokens in the API response?

How many thinking tokens does a typical request use?

Is o4-mini better than o3-mini?

When should I use DeepSeek R1 versus o3?

Related Cost Guides

Cheapest AI Model for Every Task: April 2026 Buyer's Guide

How Many AI Tokens Can You Get for $1? Every Major Model Compared

AI Model Tiers Explained: Nano, Mini, Standard, and Pro Pricing Guide for 2026