Question 1

How do I calculate the cost of an LLM or AI API?

Accepted Answer

Multiply your average input tokens per request by your price per input token, add your average output tokens times your price per output token, then multiply by how many requests you make. Most providers price per million tokens, with output tokens costing several times more than input. This calculator does the arithmetic across daily, monthly and yearly volume and lets you compare models side by side.

Question 2

Why is my AI API bill higher than expected?

Accepted Answer

Three usual culprits: output tokens (priced 4–5× input on most models) add up faster than people expect, every request re-sends a large system prompt or context that could be cached, and easy requests are sent to an expensive frontier model that a cheaper one could handle. Model routing, prompt caching and batching typically cut total spend 30–90% without lowering answer quality.

Question 3

How much can model routing and prompt caching save?

Accepted Answer

Published, industry-typical figures: intelligent model routing reduces inference cost roughly 30–60% on mixed workloads, prompt caching cuts 50–90% on the cache-eligible portion (cache reads cost about a tenth of normal input tokens), and provider batch APIs give around 50% off for jobs that can complete within 24 hours. Actual savings depend on your workload, which is what a build audit measures.

Question 4

Are these AI model prices current?

Accepted Answer

Claude model prices are pre-filled from Anthropic's published rates; other providers are pre-filled with commonly published figures. Every price field in the calculator is editable, so you can paste in your provider's exact current pricing. Always confirm against the provider's official pricing page before budgeting.

LLM / AI API cost calculator

Your workload

Model comparison

Routing + caching estimate

This is the work, not a guess.

Questions about AI cost

Related