If you're running large-scale AI workloads on OpenAI and paying full price for every API call, you're leaving money on the table. OpenAI's Batch API offers 50% off standard pricing in exchange for a 24-hour turnaround time.
This guide explains what the Batch API is, how it works, when to use it, and how to calculate your savings across every OpenAI model.
[stat] 50% The discount OpenAI gives on every Batch API call — same models, same quality, half the price
What is the Batch API?
The Batch API is OpenAI's asynchronous processing option. Instead of sending requests one at a time and getting instant responses, you submit a batch of requests (up to 50,000 at once) and receive results within 24 hours.
In exchange for waiting, you pay 50% less than standard API pricing. The quality is identical — same models, same outputs, same capabilities. You're just giving OpenAI scheduling flexibility.
Standard vs batch pricing across all models
| Model | Standard Input | Standard Output | Batch Input | Batch Output |
|---|---|---|---|---|
| GPT-5.2 | $1.75 | $14.00 | $0.875 | $7.00 |
| GPT-5.2 pro | $21.00 | $168.00 | $10.50 | $84.00 |
| GPT-5 | $1.25 | $10.00 | $0.625 | $5.00 |
| GPT-5 Mini | $0.25 | $2.00 | $0.125 | $1.00 |
| GPT-5 nano | $0.05 | $0.40 | $0.025 | $0.20 |
| o3 | $2.00 | $8.00 | $1.00 | $4.00 |
| o4-mini | $1.10 | $4.40 | $0.55 | $2.20 |
The savings scale linearly. Whether you're processing 1,000 or 1,000,000 requests, each one costs exactly half. For teams spending $10,000+/month on OpenAI, batch-eligible workloads can save $5,000/month or more.
How the Batch API works
Here's the workflow:
- Prepare your requests in JSONL format (one JSON object per line, each containing a single API request).
- Upload the file to OpenAI's API.
- Submit the batch job with the file ID.
- Wait up to 24 hours for processing (usually faster — most batches complete in 2-6 hours).
- Download the results as a JSONL file with responses for each request.
The process is asynchronous. You don't get instant responses — OpenAI queues your batch, processes it during off-peak hours, and returns the results when done.
Sample JSONL input
{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-5.2", "messages": [{"role": "user", "content": "Summarize this article..."}]}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-5.2", "messages": [{"role": "user", "content": "Translate this text..."}]}}
Each line is a separate request. The custom_id field lets you match responses back to your original requests.
When to use the Batch API
The Batch API is ideal for workloads that don't need instant results. Here are the five best use cases:
1. Data processing and analysis
- Categorizing thousands of support tickets
- Analyzing customer feedback or reviews
- Tagging and labeling datasets
- Extracting structured data from documents
2. Content generation at scale
- Generating product descriptions for an entire catalog
- Creating personalized email campaigns
- Writing meta descriptions for thousands of web pages
- Translating content in bulk
3. Embeddings generation
- Creating vector embeddings for large document libraries
- Building search indexes
- Generating embeddings for recommendation systems
4. Evaluation and testing
- Running quality evaluations on model outputs
- A/B testing different prompts across large datasets
- Benchmarking model performance
5. Nightly or weekly batch jobs
- Report generation
- Data enrichment pipelines
- Scheduled content updates
💡 Key Takeaway: If you can wait 24 hours for results, the Batch API is a no-brainer. Most batches complete in 2-6 hours anyway — you rarely wait the full 24.
When NOT to use the Batch API
The Batch API has limitations that make it unsuitable for some use cases:
Real-time applications
- Chatbots and live customer support
- Interactive tools where users expect instant responses
- Applications that stream output progressively
Time-sensitive workflows
- Alerts and notifications
- Real-time decision systems
- Anything where waiting 24 hours defeats the purpose
Streaming use cases
The Batch API does not support streaming. Responses are returned only after the entire batch completes. For real-time chatbot costs, see our AI chatbot cost breakdown.
⚠️ Warning: Don't try to hack real-time responses by submitting single-request batches. The minimum processing time is still minutes, not seconds. Use standard API for anything user-facing.
Cost savings calculation
Let's calculate savings for a real-world scenario: generating product descriptions for 10,000 items.
Assumptions:
- Model: GPT-5 Mini
- Input per request: 100 tokens (product name, category, specs)
- Output per request: 200 tokens (generated description)
- Total requests: 10,000
Standard API pricing:
- GPT-5 Mini: $0.25 input / $2.00 output per 1M tokens
- Input: 10,000 × 100 = 1M tokens → $0.25
- Output: 10,000 × 200 = 2M tokens → $4.00
- Total: $4.25
Batch API pricing (50% off):
- Batch GPT-5 Mini: $0.125 input / $1.00 output per 1M tokens
- Input: 1M tokens → $0.125
- Output: 2M tokens → $2.00
- Total: $2.125
Savings: $2.125 (50%)
📊 Quick Math: At 10,000 product descriptions per batch, you save $2.12 per run. Run this weekly and save $110/year. At 100,000 items weekly, that's $1,100/year saved.
Real savings at scale
Here's a cost comparison across different workload sizes using GPT-5 Mini:
| Requests | Input Tokens | Output Tokens | Standard Cost | Batch Cost | Monthly Savings |
|---|---|---|---|---|---|
| 1,000 | 100K | 200K | $0.43 | $0.21 | $0.22 |
| 10,000 | 1M | 2M | $4.25 | $2.13 | $2.12 |
| 100,000 | 10M | 20M | $42.50 | $21.25 | $21.25 |
| 1,000,000 | 100M | 200M | $425.00 | $212.50 | $212.50 |
At 1 million requests per month, you save $212.50/month or $2,550/year. That's real money.
Batch API with premium models
The 50% discount applies to all OpenAI models, including flagship and reasoning models. The savings are even larger on expensive models:
GPT-5.2 (flagship) at scale:
- 100K requests, 500 input + 1,000 output tokens each
- Standard: 50M input ($87.50) + 100M output ($1,400) = $1,487.50
- Batch: $43.75 + $700 = $743.75
- Savings: $743.75/month = $8,925/year
GPT-5.2 pro (reasoning) — the biggest savings:
- Same 100K requests
- Standard: 50M input ($1,050) + 100M output ($16,800) = $17,850
- Batch: $525 + $8,400 = $8,925
- Savings: $8,925/month = $107,100/year
[stat] $107,100/year Potential savings from switching 100K monthly GPT-5.2 pro requests to the Batch API
Combining strategies for maximum savings
You can stack cost-saving tactics for even better results:
1. Use the Batch API for async workloads
Save 50% on anything that can wait. This is the easiest win.
2. Choose efficient models
Use GPT-5 Mini ($0.25/$2.00) or GPT-5 nano ($0.05/$0.40) instead of GPT-5.2 ($1.75/$14.00) when quality allows. See our budget AI models guide for recommendations.
3. Optimize prompts
Shorter prompts = fewer input tokens. Use concise instructions. Read our guide on hidden costs of AI APIs for specific tactics to reduce context waste.
4. Limit output length
Set max_tokens to prevent verbose responses. Every unnecessary token costs money.
5. Route by urgency
- Urgent requests → Standard API (full price, instant response)
- Non-urgent requests → Batch API (50% off, 24-hour turnaround)
This hybrid approach maximizes both speed and cost efficiency.
6. Consider alternatives for non-OpenAI workloads
If you're not locked into OpenAI, compare batch savings against cheaper providers. DeepSeek V3.2 at $0.28/$0.42 standard pricing is already cheaper than most OpenAI batch prices. See our three-way provider comparison for details.
💡 Key Takeaway: Combining the Batch API with model downgrades and prompt optimization can cut costs by 70-80% versus naively using flagship models at standard pricing.
Real-world batch workflow: content pipeline
Here's how a typical content team uses the Batch API to save thousands per month:
Step 1: Collect the day's work. Throughout the day, queue up content requests — blog outlines, product descriptions, email drafts, social media posts. Store them in a database or queue.
Step 2: Build the JSONL file. At end of day (or on a schedule), export queued requests into JSONL format. Each request includes the model, messages, and any parameters like max_tokens or temperature.
Step 3: Submit and wait. Upload and submit the batch. Most teams run this as a nightly cron job. Results are typically ready by morning.
Step 4: Process and distribute. Download the completed batch, match results to original requests via custom_id, and route outputs to their destinations — CMS, email platform, review queue.
This workflow turns what would be hundreds of dollars in real-time API calls into a single batch at half the cost. Teams processing 10,000+ content pieces per month routinely save $500-$5,000/month depending on the model used.
The key mindset shift: stop thinking of AI API calls as synchronous operations. Most content generation, data processing, and analysis can be batched without any impact on your team's workflow. The content doesn't need to exist instantly — it needs to exist by the time someone looks at it.
How to get started
Using the Batch API is straightforward:
- Read the OpenAI docs: https://platform.openai.com/docs/guides/batch
- Prepare your JSONL file with requests.
- Upload and submit via the API or OpenAI dashboard.
- Poll for completion or set up a webhook for notifications.
- Download results and process them.
OpenAI provides SDKs for Python, Node.js, and other languages to simplify the process.
Limitations to keep in mind
The Batch API is powerful, but it has trade-offs:
- 24-hour turnaround: Results aren't instant. Plan accordingly.
- No streaming: You get results only after the batch completes.
- Batch size limits: Maximum 50,000 requests per batch (submit multiple batches if needed).
- No real-time feedback: If a request fails, you won't know until the batch finishes.
- OpenAI only: Other providers don't offer an equivalent batch discount (yet).
For workflows that fit these constraints, the 50% savings are well worth it.
✅ TL;DR: The Batch API cuts OpenAI costs in half for any workload that can tolerate a 24-hour delay. At scale, savings reach $100K+/year on premium models. Combine with model selection and prompt optimization for 70-80% total cost reduction.
Frequently asked questions
How much does the OpenAI Batch API save?
Exactly 50% on all models and token types. GPT-5.2 drops from $1.75/$14.00 to $0.875/$7.00 per million tokens. GPT-5 nano drops from $0.05/$0.40 to $0.025/$0.20. The discount is flat and predictable — no tiers or volume commitments required.
How long does batch processing take?
OpenAI guarantees results within 24 hours, but most batches complete in 2-6 hours. Processing time depends on batch size and current API load. There's no way to expedite — if you need faster results, use the standard API at full price.
Can I use the Batch API for chatbot conversations?
No. The Batch API is for asynchronous, non-interactive workloads. Chatbots require real-time responses and streaming, neither of which the Batch API supports. For chatbot cost optimization, see our chatbot cost breakdown and consider using cheaper models like DeepSeek V3.2.
What's the maximum batch size?
50,000 requests per batch. If you need to process more, submit multiple batches. There's no limit on the number of concurrent batches, so you can parallelize large workloads by splitting them into 50K-request chunks.
Is the Batch API worth it for small workloads?
It depends on the model. For GPT-5 nano (already $0.05/$0.40), the savings per request are tiny — fractions of a cent. For GPT-5.2 pro ($21/$168), even a small batch of 100 requests saves meaningful money. Use our cost calculator to model your specific workload and see if the 50% discount moves the needle.
