LLM Token Cost Calculator (2026)

Estimate per-request, monthly, and yearly cost across OpenAI, Anthropic, and Google models, or price a custom model of your own.

Diego Aguirre5 min read

LLM Token Cost Calculator

Start from a preset
Cost per request
$0.000525
Per day
$0.11
Per month
$3.15
Per year
$38.32
Input tokens (43%)
$0.000225/req
Output tokens (57%)
$0.000300/req

At 200 requests/day, GPT-4o mini costs about $0.11 per day, roughly $3.15 per month and $38.32 per year.

Drop this calculator into your own site

Paste a single iframe into any blog, docs page, or pricing page. No script, no account, no tracking added to your visitors. The calculator renders instantly. You can also browse every embeddable calculator in the embed gallery.

Token Cost Calculator

Copy the snippet and paste it anywhere you write HTML.

<iframe
  src="https://www.budgetforge.dev/embed/token-cost-calculator?model=gpt-4o-mini"
  title="LLM Token Cost Calculator by BudgetForge"
  width="100%"
  height="720"
  loading="lazy"
  style="border:1px solid #e2e8f0;border-radius:16px;max-width:560px"
></iframe>

How to use the calculator

Start from a preset if you want realistic numbers in one click, then adjust anything. Set the model, the average input tokens per request, the average output tokens per request, and your requests per day. The cost per request updates the moment you stop typing, and the tool projects it to a day, a 30-day month, and a 365-day year. Input tokens are everything you send to the model: the system prompt, the conversation history, any retrieved context, and the user message. Output tokens are only what the model writes back. The split matters because output is priced several times higher than input on every model in the table, so a chatty assistant costs more than a chatty user.

If your model is not in the list, choose Custom model and two fields appear for the input and output rate in dollars per million tokens. Type the numbers from your provider’s pricing page and every projection updates exactly as it does for the built-in models. That covers negotiated enterprise rates, regional pricing, and anything released after this page was last updated. The rates are dated July 2026 and they live in a single pricing constant, so when a vendor moves a number you change it in one place and every figure on this page moves with it. Treat the output as a close estimate for planning, not a quote.

The models and their July 2026 rates

OpenAI

GPT-4o lists at 2.50 USD per million input tokens and 10.00 USD output, with GPT-4o mini far cheaper at 0.15 and 0.60. o3 and GPT-4.1 both sit at 2.00 and 8.00, and GPT-4.1 mini at 0.40 and 1.60. The mini tiers are the volume workhorses; the full tiers earn their price on harder reasoning. The current sheet is on the OpenAI pricing page.

Anthropic Claude family

Claude spans a wide price range in 2026, which is the whole reason it is worth modelling carefully. Opus 4 is the reasoning flagship at 15.00 input and 75.00 output per million tokens. Sonnet 4 sits in the middle at 3.00 and 15.00, and Haiku 3.5 is the cheap workhorse at 0.80 and 4.00. The published numbers are on the Anthropic pricing page.

  • Opus 4: long-context reasoning, hard tool use, multi-step agent planning.
  • Sonnet 4: a balanced default for most production traffic.
  • Haiku 3.5: routing, classification, and short replies at high volume.

Google Gemini

Gemini 2.5 Pro runs 1.25 input and 10.00 output per million tokens, and Gemini 2.5 Flash runs 0.30 and 2.50. Flash is one of the cheapest credible options on the board for high-volume, low-stakes work such as tagging, routing, and first-pass moderation. The rates come from the Google AI pricing page.

Three cohorts with real numbers

The presets map to three common shapes of workload. Here is the math spelled out so you can trust the tool. If you want the wider picture that adds retries, orchestration overhead, and platform fees on top of raw tokens, use the full-workflow cost calculator; this page is deliberately just the model API line.

Indie side project

200 requests a day, 1,500 input tokens, 500 output tokens, on GPT-4o mini. Input is 1,500 divided by a million, times 0.15, which is 0.000225 USD. Output is 500 divided by a million, times 0.60, which is 0.0003 USD. Per request that is 0.000525 USD. At 200 requests a day you are near 0.11 USD a day, about 3.15 USD a month, and roughly 38 USD a year. A side project on a small model is close to free.

Funded startup

5,000 requests a day, 2,500 input tokens, 800 output tokens, on GPT-4o. Input is 2,500 divided by a million, times 2.50, which is 0.00625 USD. Output is 800 divided by a million, times 10.00, which is 0.008 USD. Per request that is 0.01425 USD, so 5,000 requests a day is about 71.25 USD a day, roughly 2,138 USD a month, and about 26,006 USD a year. This is where model choice starts to matter to the runway.

Agency / high volume

50,000 requests a day, 3,000 input tokens, 1,200 output tokens, on Claude Haiku 3.5. Input is 3,000 divided by a million, times 0.80, which is 0.0024 USD. Output is 1,200 divided by a million, times 4.00, which is 0.0048 USD. Per request that is 0.0072 USD, so 50,000 requests a day is about 360 USD a day, roughly 10,800 USD a month, and about 131,400 USD a year. Flip the same traffic to Claude Opus 4 and the monthly figure jumps past 200,000 USD, which is why volume work belongs on a small model unless quality testing forces an upgrade.

What the calculator does not include

The figure here is the model API line, and only that. A production bill has more in it, and pretending otherwise is how teams get surprised in month two. Budget for the items below on top of whatever this tool reports.

  • Network egress and data transfer between your services and the API.
  • Hosting and compute for your own application servers and queues.
  • Observability, tracing, and logging tooling for the LLM calls.
  • Vector store and embedding storage cost for retrieval.
  • Evaluation and test-harness runs, which quietly burn tokens of their own.
  • Human review and labeling time, the cost everyone forgets to count.

One more line that this calculator does not model: routing aggregators. If you call models through a layer such as OpenRouter, it adds a small pass-through margin on top of the base vendor rate, so your real per-call cost sits a little above the figure here. We work through that markup in the OpenRouter pricing teardown, including when the convenience is worth the spread.

Sources

  1. OpenAI API pricing (platform.openai.com/pricing, July 2026)
  2. Anthropic pricing (anthropic.com/pricing, July 2026)
  3. Google AI pricing (ai.google.dev/pricing, July 2026)

FAQ

How accurate is the BudgetForge token cost calculator in 2026?

The math is exact for the rates listed, and those rates were checked against each provider's public pricing page in July 2026. Vendors change prices without much notice, so treat the output as a close estimate, not a contract. The per-million-token rates live in one file and are dated, so when a price moves the fix is a one-line change. Always confirm against your latest invoice before committing a budget.

How is the cost per request calculated?

Cost per request is average input tokens divided by one million, times the model's input rate, plus average output tokens divided by one million, times the output rate. The daily figure is cost per request times requests per day. The monthly figure multiplies the daily figure by thirty, and the yearly figure multiplies it by 365. Every number on the page comes from that single formula, so you can sanity-check it on the back of an envelope.

Why is output priced higher than input?

Generation is the expensive part. On every model in the table, output tokens cost several times more than input tokens: GPT-4o is 2.50 in and 10.00 out, Claude Sonnet 4 is 3.00 in and 15.00 out, Gemini 2.5 Flash is 0.30 in and 2.50 out. That is why a verbose response costs far more than a verbose prompt, and why trimming output length is usually the fastest way to cut a bill. The calculator shows the input-versus-output split so you can see where the money actually goes.

Which model is cheapest for high-volume work?

For high-volume, low-stakes work, the small tiers win by a wide margin. GPT-4o mini (0.15 in, 0.60 out), Gemini 2.5 Flash (0.30 in, 2.50 out), and Claude Haiku 3.5 (0.80 in, 4.00 out) are all built for volume. Because request count multiplies every fraction of a cent, a model that is ten times cheaper per token is a ten-times-cheaper bill. Start small, measure quality on real traffic, and only escalate the specific intents that fail.

Can I price a model that is not in the list?

Yes. Pick 'Custom model' in the dropdown and two fields appear for the input and output rate in dollars per million tokens. Type the numbers from your provider's pricing page and the calculator projects per-request, daily, monthly, and yearly cost exactly as it does for the built-in models. This is handy for regional pricing, negotiated enterprise rates, or a model released after this page was last updated.

What does this calculator not include?

The figure here is the model API line and only that. A production bill also carries network egress, hosting and compute for your own servers, observability and tracing, vector-store and embedding storage, evaluation runs, and human review time. Routing aggregators such as OpenRouter add a small pass-through margin on top of the base rate too. Budget for those separately on top of whatever this tool reports.

D

Written by

Diego Aguirre

Diego Aguirre writes the money-honest desk at BudgetForge: pragmatic breakdowns of what software actually costs to run, with the math shown and the adjectives left out.