Free tool

LLM / AI API cost calculator

Estimate what your AI feature really costs per month — from tokens-per-request and volume — then see how much model routing and prompt caching could cut it. No signup. Every price is editable.

💡 Prices are editable. Claude rates are pre-filled from Anthropic's published pricing; others use commonly published figures. Confirm against the provider before budgeting: Anthropic · OpenAI · Google.

Your workload

$
$

Rough guide: a paragraph ≈ 100 tokens; a page ≈ 500. Output is usually pricier than input.

Estimated monthly cost
$0
$0per day
$0per year
$0per request
input $0 + output $0
Same workload, every model

Model comparison

Model$/1M in$/1M outMonthlyYearly

Comparison uses each model's pre-filled price and your token + volume inputs above. Edit a price and it recalculates.

What good engineering saves

Routing + caching estimate

Two of the biggest levers. Drag to model your own workload.

40%
50%
0% lower — about $0/mo

Optimized estimate: $0/mo vs $0/mo today.

Illustrative. Caching credits the cacheable share at ~10% of input price (the published cache-read rate); routing assumes routed requests drop to ~20% of current cost (e.g. a frontier model → a small model). Real results depend on your workload — which is exactly what we measure.

This is the work, not a guess.

Model routing, prompt caching, batching and right-sized infra — engineered into your product so cost-per-user stays flat as you scale. Book a free build audit and we'll measure where your spend actually goes.

Book a Build Audit
FAQ

Questions about AI cost

How do I calculate the cost of an LLM or AI API?

Multiply average input tokens per request by your input price, add average output tokens times your output price, then multiply by request volume. Providers price per million tokens, and output usually costs 4–5× input. This tool does that across daily, monthly and yearly volume and compares models side by side.

Why is my AI API bill higher than expected?

Usually three things: output tokens add up faster than people expect, a large system prompt or context is re-sent uncached on every call, and easy requests hit an expensive frontier model a cheaper one could handle. Routing, caching and batching typically cut spend 30–90% without lowering quality.

How much can routing and caching save?

Industry-typical: model routing ~30–60% on mixed traffic, prompt caching 50–90% on the cache-eligible portion (cache reads cost ≈10% of normal input tokens), and batch APIs ~50% off jobs that can wait up to 24 hours. Actual savings depend on your workload.

Are these prices current?

Claude prices are pre-filled from Anthropic's published rates; other providers use commonly published figures. Every price field is editable — paste your provider's exact current pricing in. Always confirm on the provider's official pricing page before budgeting.

Related