OpenAI API pricing 2026: the real 30-day bill

On this page
Quick answer
The OpenAI API is pay-as-you-go, billed per million tokens, so your real "price" is three numbers: the input rate, the output rate, and how many of each your workload burns. As of July 2026, GPT-5.4-mini runs $0.75 per million input tokens and $4.50 per million output; GPT-5.4 is $2.50 in and $15 out; the flagship GPT-5.5 is $5 in and $30 out (OpenAI's published per-model rate sheet). Below we model the actual 30-day bill for three teams: a solo indie feature (about $30 a month), a funded startup with a core AI flow (about $2,738 a month), and an agency running an agentic product at scale (about $24,480 a month). One pattern decides all three bills, and it is not the sticker rate. It is output tokens and model choice.
These are modeled estimates from the published July 2026 rates, not vendor-quoted invoices. Your mix will differ. The point is the shape of the bill, so you can budget for the line item that actually moves.
The rates that actually bill you (July 2026)
OpenAI prices every text model on two meters: input tokens (what you send) and output tokens (what the model writes back). Output always costs more, usually 4x to 6x the input rate. Here is the current standard lineup, per million tokens, from OpenAI's rate sheet:
Scroll to see more
| Model | Input | Cached input | Output |
|---|---|---|---|
| GPT-5.5 | $5.00 | $0.50 | $30.00 |
| GPT-5.4 | $2.50 | $0.25 | $15.00 |
| GPT-5.4-mini | $0.75 | $0.075 | $4.50 |
| GPT-5.4-nano | $0.20 | $0.02 | $1.25 |
Two rate-sheet facts that change the math later: cached input (a repeated system prompt or document you mark for prompt caching) is billed at roughly one tenth of the standard input rate, and the Batch API takes 50% off both input and output for any request you can run asynchronously instead of in real time. Neither shows up on the headline price, and both are where real bills get cut.
Cohort A: the solo indie feature, about $30 a month
The setup: one developer, one AI feature live in a small app (a support reply drafter, say), running on GPT-5.4-mini because it is the cheapest model that is still good enough for production. Assume 300 requests a day, averaging 1,500 input tokens and 500 output tokens per call.
Scroll to see more
| Line item | Monthly tokens | Rate | Cost |
|---|---|---|---|
| Input | 13.5M | $0.75 / 1M | $10.13 |
| Output | 4.5M | $4.50 / 1M | $20.25 |
| Total | $30.38 |
Look at the split. Output is only 25% of the tokens this feature moves, but it is 67% of the bill. That is the single most useful thing to internalize about OpenAI pricing: the cheap meter is the one you send, the expensive meter is the one the model writes. Trimming a verbose system prompt saves pennies. Telling the model to answer in three sentences instead of three paragraphs saves dollars.
At this scale, cost control is not a spreadsheet exercise. Thirty dollars a month is a rounding error against a solo dev's time. The reason to understand the split now is that the same shape, scaled up 800x, is what an agency pays.
Cohort B: the funded startup, about $2,738 a month
The setup: a small SaaS with a real AI flow at its core (document analysis, in-app chat, whatever the product is), running GPT-5.4 for quality. Assume 5,000 requests a day, 2,500 input tokens and 800 output tokens each.
Scroll to see more
| Line item | Monthly tokens | Rate | Cost |
|---|---|---|---|
| Input | 375M | $2.50 / 1M | $937.50 |
| Output | 120M | $15.00 / 1M | $1,800.00 |
| Total | $2,737.50 |
Output is 66% of the bill again. But at this volume a second lever appears: most of those 2,500 input tokens per call are the same system prompt, instructions, and context on every request. Say 1,800 of the 2,500 are stable and cacheable. Prompt caching bills those repeated tokens at $0.25 per million instead of $2.50.
Scroll to see more
| Line item | Monthly tokens | Rate | Cost |
|---|---|---|---|
| Cached input | 270M | $0.25 / 1M | $67.50 |
| Fresh input | 105M | $2.50 / 1M | $262.50 |
| Output | 120M | $15.00 / 1M | $1,800.00 |
| Total (with caching) | $2,130.00 |
That is roughly $608 a month saved, about 22%, from one configuration change and no quality loss. (We are ignoring the small cache-write surcharge here; on a prompt read thousands of times a day it is noise.) And notice what caching does to the shape: once input is cheap, output is now 85% of the bill. There is no cache for output. The only way to cut it is to route to a cheaper model or generate less text. If you want to sanity-check any of these numbers against your own token counts, we keep a live token cost calculator that does exactly this math per model.
Cohort C: the agency at scale, about $24,480 a month
The setup: an agency (or a Series A company) running an agentic product, where one user action fans out into many model calls. Assume 50,000 requests a day, 3,000 input and 1,200 output tokens each, split across two models: 20% of calls go to GPT-5.5 for the reasoning-heavy steps, 80% to GPT-5.4-mini for the routine ones.
Scroll to see more
| Model (share) | Input cost | Output cost | Subtotal |
|---|---|---|---|
| GPT-5.5 (20%) | 900M × $5 = $4,500 | 360M × $30 = $10,800 | $15,300 |
| GPT-5.4-mini (80%) | 3,600M × $0.75 = $2,700 | 1,440M × $4.50 = $6,480 | $9,180 |
| Total | $24,480 |
Here is the number that runs the whole business: the 20% of traffic sitting on GPT-5.5 is $15,300, or 62% of the monthly bill. One fifth of the requests, nearly two thirds of the cost. Model routing is not a tuning detail at this scale. It is the budget.
Move that same 20% of calls from GPT-5.5 to GPT-5.4 (still a strong model, half the input rate and half the output rate) and the reasoning slice drops from $15,300 to $7,650. The monthly bill falls from $24,480 to $16,830, a $7,650 (31%) cut with no change in volume. Send whatever portion of that work is asynchronous through the Batch API and its share halves again. This is why "which OpenAI model is cheapest" is the wrong question. The right one is "which requests genuinely need the expensive model, and can I prove the rest do not."
The three levers, ranked by how much they move the bill
- Model routing. The gap between GPT-5.5 and GPT-5.4-mini is 6.7x on output. Sending a request to a model one tier too high is the most expensive mistake on this page. Route by task, not by default.
- Output discipline. Output is 4x to 6x the input rate and cannot be cached. Cap max tokens, ask for structured or terse answers, and stop paying for prose nobody reads.
- Caching and batch. Prompt caching cuts repeated input by ~90%; the Batch API cuts async work by 50%. Neither touches the sticker price, both are real money. Our OpenRouter pricing teardown walks through when a routing layer on top of these rates pays for itself and when it just adds a fee.
How OpenAI's rates compare to Claude at each tier
To keep this honest, OpenAI is not automatically the cheapest option. Here is where its lineup lands against Anthropic's, per million tokens, from Anthropic's per-token rate sheet (July 2026):
At the flagship tier, GPT-5.5 ($5 / $30) and Claude Opus 4.8 ($5 / $25) match on input, and Opus is cheaper on output. At the workhorse tier, GPT-5.4 ($2.50 / $15) sits just above Claude Sonnet 4.6 ($3 / $15), while Anthropic's Sonnet 5 introductory rate ($2 / $10 through August 31, 2026) undercuts both. At the small-model tier, GPT-5.4-mini ($0.75 / $4.50) edges out Claude Haiku 4.5 ($1 / $5) on both meters. Both providers give the same two discounts: prompt-cache reads at roughly 0.1x input, and a 50% Batch API cut. The takeaway is not "switch." It is that at agency volume, a few cents per million times billions of tokens is a real line item, so benchmark the two on your actual workload before you commit a year of spend.
Where the sticker price misleads you
If you budget from the headline "$2.50 per million" number, you will be wrong by the ratio of output to input in your workload, which in every cohort above ran roughly 2 to 1 in output's favor before caching. Concrete costs also move: teams underestimate agentic fan-out (one user action becoming twenty model calls) and overestimate how much the input rate matters. For a wider set of "how much does the API actually cost" data points from people running real workloads, an r/homeassistant thread from 2026 is a useful gut check, and Azure's OpenAI Service pricing page is worth reading if you route through Azure, where the same GPT-5.4 rates apply but very large context requests are billed at a higher tier.
Math check: across all three cohorts, output tokens were 66% to 85% of the bill, and at agency scale one model tier accounted for 62% of spend. Budget the output meter and the model mix, not the input sticker.
Written by
Camille ForsterFrequently asked questions
How much does the OpenAI API actually cost per month?
It depends entirely on volume and model. Modeled from the July 2026 rates: a solo indie feature on GPT-5.4-mini (300 requests a day) runs about $30 a month; a funded startup on GPT-5.4 (5,000 requests a day) about $2,738; an agency running an agentic product on a GPT-5.5 and GPT-5.4-mini mix (50,000 requests a day) about $24,480. Output tokens and model choice drive the bill far more than the input sticker rate.
Is the OpenAI API cheaper than ChatGPT Plus?
They bill differently. ChatGPT Plus is a flat per-seat subscription for the chat product; the API is pay-as-you-go per token for building your own app. For light personal use the API can cost a few dollars a month, well under a Plus seat. For a production feature serving many users, the API scales with usage and can far exceed a subscription. They are not substitutes: one is a product, the other is infrastructure.
Why is my output token bill higher than my input bill?
Because OpenAI prices output tokens at roughly 4x to 6x the input rate, and cannot cache them. In every cohort we modeled, output was 66% to 85% of the total bill despite being a minority of the tokens. Capping max output tokens and asking for terse or structured answers is the fastest way to cut spend.
Does the Batch API really cut costs in half?
Yes, for work you can run asynchronously. The Batch API applies a 50% discount to both input and output tokens for requests submitted as a batch job rather than in real time. It is ideal for bulk classification, summarization, or overnight processing, and it stacks with prompt caching. It is not usable for interactive, latency-sensitive requests.
Which OpenAI model is cheapest for production?
GPT-5.4-nano ($0.20 input, $1.25 output per million tokens) is the cheapest, followed by GPT-5.4-mini ($0.75 / $4.50). Most production apps land on GPT-5.4-mini as the best value that is still capable, and reserve GPT-5.4 or GPT-5.5 only for the specific requests that genuinely need more reasoning. Routing by task instead of defaulting to a top model is the single biggest cost lever.
How much can prompt caching save?
For workloads with a large, stable system prompt or context repeated across calls, prompt caching bills those repeated tokens at about one tenth of the standard input rate. In our startup cohort it cut roughly $608 a month, about 22% of the bill, with no quality change. The savings scale with how much of your input is repeated and how often it is read.
Related reading
Claude Code pricing 2026: real 30-day bill for a solo dev, a team, and an agency
Three real 30-day Claude Code bills, the 5.5x token-efficiency math, and when the $200 Max plan beats the API.
OpenRouter pricing 2026: real 30-day bill vs direct providers
OpenRouter takes 5.5% on credit purchases with a $0.80 minimum, then passes provider rates through. Here is the real 30-day bill across Anthropic, OpenAI, and Mistral at four workload mixes, with the anti-patterns where the routing tax stops being worth it.

