Not every application needs premium reasoning. For high-volume chat, simple classification, content moderation, or prototyping, the cheapest models on the market deliver 80% of the capability at 1% of the cost. Here's a data-driven ranking of the most affordable AI model APIs in 2026.

Ranked by Input Price (Cheapest First)

#	Model	Provider	Input $/M	Output $/M	Cache Read	SWE-bench
1	Gemini Flash Lite	Google	$0.10	$0.40	$0.01	45.0
2	DeepSeek V4 Flash	DeepSeek	$0.14	$0.28	$0.0028	79.0
3	Gemini 3.5 Flash	Google	$0.25	$1.50	$0.025	65.0
4	Qwen 3 Coder	Alibaba	$0.30	$1.50	—	77.0
5	MiniMax M3	MiniMax	$0.30	$1.20	—	80.5
6	DeepSeek V4 Pro	DeepSeek	$0.435	$0.87	$0.003625	80.6

DeepSeek V4 Flash is the standout: it scores 79.0 on SWE-bench — better than Claude Sonnet 4.6 at 79.6 — for $0.14/M input vs Sonnet's $3.00/M. That's 21× cheaper for comparable coding ability.

Total Monthly Cost at Different Scales

Per-token price matters, but total cost is what hits your credit card. Here's how the cheapest models compare at three usage levels (5M input + 2M output tokens/month, no caching):

Model	5M in / 2M out	50M in / 20M out	500M in / 200M out
Gemini Flash Lite	$1.30	$13.00	$130.00
DeepSeek V4 Flash	$1.26	$12.60	$126.00
Gemini 3.5 Flash	$4.25	$42.50	$425.00
Qwen 3 Coder	$4.50	$45.00	$450.00
— vs premium for reference —
Claude Fable 5	$150.00	$1,500.00	$15,000.00

At 500M input + 200M output/month, Flash Lite saves $14,870/month vs Fable 5.

The Hidden Cost: No Caching, No Batch

Most ultra-cheap models don't support prompt caching or batch processing. This means you pay full price for every token. In contrast, a “more expensive” model like GPT-5.6 Luna with 90% cache hits can be cheaper in practice:

Gemini Flash Lite (no cache): 5M × $0.10 + 2M × $0.40 = $1.30

GPT-5.6 Luna (90% cache): 0.5M × $1.00 + 4.5M × $0.10 + 2M × $6.00 = $12.95

Flash Lite still wins, but the gap shrinks from 10× to 1.3× when the more expensive model leverages caching. At higher output volumes, Luna can actually come out ahead.

Best Cheap Model by Task

Simple chat / Q&A: Gemini Flash Lite vs DeepSeek V4 Flash — Flash Lite is cheaper per token, but V4 Flash is far more capable for only $0.04/M more on input.
Coding assistant: DeepSeek V4 Flash vs Qwen 3 Coder — DeepSeek wins on both price and benchmarks.
Content moderation / classification: Gemini Flash Lite at $0.10/M input is unbeatable for simple classification tasks.
Multimodal on a budget: Gemini 3.5 Flash at $0.25/M — the only budget model with image input.
Best overall value: MiniMax M3 vs DeepSeek V4 Pro — M3 at $0.30/M with 80.5 SWE-bench is the dark horse of the budget tier.

Use the calculator to compare models with your exact token mix — the cheapest model changes based on your input/output ratio and cache hit rate.

Cheapest LLM APIs for High-Volume Applications in 2026

Ranked by Input Price (Cheapest First)

Total Monthly Cost at Different Scales

The Hidden Cost: No Caching, No Batch

Best Cheap Model by Task