Every AI model provider charges by the token, but the pricing structure has grown more complex in 2026. Between input tokens, output tokens, prompt caching, batch discounts, and thinking tokens, the final bill can be hard to predict. This guide walks through the exact formula and shows you how to estimate costs before you deploy.
The Basic Formula
At its core, every LLM API call costs:
Cost per request =
(input_tokens / 1,000,000) × input_price
+ (output_tokens / 1,000,000) × output_price
Prices are always quoted per 1 million tokens. Divide your token count by 1M to get the price multiplier.
Step 1: Estimate Your Token Volume
Tokens are the fundamental unit. Here's a rough guide:
- 1 token ≈ 0.75 English words (or ~4 characters)
- A typical ChatGPT response (~150 words) ≈ 200 output tokens
- A full-page document (~750 words) ≈ 1,000 tokens
- A code file (~500 lines) ≈ 3,000–5,000 tokens
For production estimates, look at your application logs or provider dashboard. Most developers underestimate their output volume — responses are often 2–4× longer than prompts.
Step 2: Account for Input/Output Ratio
Output tokens cost 3–10× more than input tokens. This means your input/output ratio dramatically affects your bill:
| Scenario | Input | Output | I/O Ratio | GPT-5.4 Cost |
|---|---|---|---|---|
| Chatbot | 500 | 300 | 1.7:1 | $0.00575 |
| Code review | 3000 | 800 | 3.75:1 | $0.01950 |
| Content gen | 200 | 2000 | 1:10 | $0.03050 |
Per-request cost. The content generation scenario costs 5.3× more than chatbot despite similar total tokens — because output is 6× the input price.
Step 3: Add Caching Discounts
If your application sends the same system prompt or context prefix repeatedly, prompt caching can cut your input cost by 50–90%. The math:
Cached input cost =
(uncached_tokens / 1M) × input_price
+ (cached_tokens / 1M) × cache_read_price
+ (new_cache_writes / 1M) × cache_write_price
Anthropic and OpenAI charge ~10% of input price for cache reads. At 80% cache hit rate, your effective input price drops to ~28% of list price. DeepSeek takes this even further with cache reads at 0.8% of input price.
Step 4: Multiply by Request Volume
Monthly cost = cost per request × requests per month. A daily coding assistant making 50 requests/day × 30 days = 1,500 requests/month. Even an expensive model at $0.02/request = $30/month. Scale that to 10,000 users and it becomes $300,000/month — at which point the difference between budget models matters enormously.
Common Mistakes
- Forgetting output costs more.A model that's cheap on input but expensive on output can cost more overall than a balanced model — especially for content generation workloads.
- Ignoring batch discounts.If your workload isn't latency-sensitive, OpenAI and Anthropic batch pricing cuts costs by 50%. A $2.50/M model becomes $1.25/M — below most budget-tier pricing.
- Not testing cache hit rates. The difference between 50% and 90% cache hit rate on DeepSeek can be 5× on your input bill. Instrument your app and measure.
- Comparing only input price. A model with $0.10/M input and $2.00/M output can cost more than one with $0.50/M input and $0.50/M output — it depends entirely on your I/O ratio.
The easiest way to avoid these mistakes? Use the calculator — it factors in caching, batch, and I/O ratio automatically across all 19 models.