← Back to Blog

How to Calculate LLM API Costs: A Developer's Guide

·5 min read
guidecost calculationtutorial

Every AI model provider charges by the token, but the pricing structure has grown more complex in 2026. Between input tokens, output tokens, prompt caching, batch discounts, and thinking tokens, the final bill can be hard to predict. This guide walks through the exact formula and shows you how to estimate costs before you deploy.

The Basic Formula

At its core, every LLM API call costs:

Cost per request =

(input_tokens / 1,000,000) × input_price

+ (output_tokens / 1,000,000) × output_price

Prices are always quoted per 1 million tokens. Divide your token count by 1M to get the price multiplier.

Step 1: Estimate Your Token Volume

Tokens are the fundamental unit. Here's a rough guide:

For production estimates, look at your application logs or provider dashboard. Most developers underestimate their output volume — responses are often 2–4× longer than prompts.

Step 2: Account for Input/Output Ratio

Output tokens cost 3–10× more than input tokens. This means your input/output ratio dramatically affects your bill:

ScenarioInputOutputI/O RatioGPT-5.4 Cost
Chatbot5003001.7:1$0.00575
Code review30008003.75:1$0.01950
Content gen20020001:10$0.03050

Per-request cost. The content generation scenario costs 5.3× more than chatbot despite similar total tokens — because output is 6× the input price.

Step 3: Add Caching Discounts

If your application sends the same system prompt or context prefix repeatedly, prompt caching can cut your input cost by 50–90%. The math:

Cached input cost =

(uncached_tokens / 1M) × input_price

+ (cached_tokens / 1M) × cache_read_price

+ (new_cache_writes / 1M) × cache_write_price

Anthropic and OpenAI charge ~10% of input price for cache reads. At 80% cache hit rate, your effective input price drops to ~28% of list price. DeepSeek takes this even further with cache reads at 0.8% of input price.

Step 4: Multiply by Request Volume

Monthly cost = cost per request × requests per month. A daily coding assistant making 50 requests/day × 30 days = 1,500 requests/month. Even an expensive model at $0.02/request = $30/month. Scale that to 10,000 users and it becomes $300,000/month — at which point the difference between budget models matters enormously.

Common Mistakes

The easiest way to avoid these mistakes? Use the calculator — it factors in caching, batch, and I/O ratio automatically across all 19 models.

📊 Calculate your exact AI model costs

Compare 19 models with your own token volumes, cache settings, and batch options.

Try the Calculator →