LLM Cost Calculator: Compare API Pricing for Every Model (2026)

A single Claude Opus 4 conversation with full 200K context costs $4.50. Run that 100 times a day and you're spending $13,500/month on one workflow. This page breaks down every major LLM's API pricing, shows what different workloads actually cost, and explains the strategies that cut bills by 60% or more.

$0.10

Cheapest input (per M tokens)

$75

Most expensive output (per M)

60%

Savings via context compression

750x

Price gap: cheapest to priciest

The Hidden Cost of Context

The per-token price is not what makes LLM APIs expensive. The context window is.

Every token you send as input gets billed whether or not the model uses it. A coding agent that sends a full 50,000-token file to make a 200-token edit pays for all 50,000 input tokens. The same edit with context compression sends 20,000 tokens. Same result, 60% lower cost.

This compounds fast in agentic workflows. A coding agent runs 20-50 tool calls per session, each call accumulating more context. By the 30th call, the agent might be sending 100K+ tokens per request. At Claude Opus 4.6 pricing ($5/M input), that single request costs $0.50. Multiply across a full session and you're at $3-15 per session, per developer.

The 80/20 of LLM costs

Input tokens typically account for 70-85% of total API spend. Output tokens are expensive per-unit but low-volume. Cutting input token count is where the money is.

LLM API Pricing Comparison (March 2026)

Current per-million-token pricing for every major model, sourced from official API pricing pages. Prices shown are standard (non-batch, non-cached) rates.

Frontier Models

These models handle complex reasoning, novel code generation, and multi-step planning. Use them when accuracy matters more than cost.

Model	Input	Output	Context Window
Claude Opus 4.6	$5.00	$25.00	1M tokens
Claude Opus 4	$15.00	$75.00	200K tokens
GPT-5.4	$2.50	$15.00	1M tokens
GPT-5.4 Pro	$30.00	$180.00	1M tokens
Gemini 2.5 Pro	$1.25	$10.00	1M tokens

Mid-Tier Models

Best for general coding, analysis, content generation, and most production workloads. The sweet spot for cost vs. capability.

Model	Input	Output	Context Window
Claude Sonnet 4.6	$3.00	$15.00	1M tokens
GPT-5.1	$1.25	$10.00	400K tokens
Mistral Large	$0.50	$1.50	262K tokens
Mistral Medium 3.1	$0.40	$2.00	131K tokens
DeepSeek V3.2	$0.26	$0.38	164K tokens

Budget Models

For classification, extraction, summarization, and high-volume tasks where cost per request needs to stay below $0.001.

Model	Input	Output	Context Window
Claude Haiku 4.5	$1.00	$5.00	200K tokens
GPT-5.4 Mini	$0.75	$4.50	400K tokens
GPT-5.4 Nano	$0.20	$1.25	400K tokens
Gemini 2.5 Flash	$0.30	$2.50	1M tokens
Gemini 2.5 Flash Lite	$0.10	$0.40	1M tokens
Mistral Small	$0.15	$0.60	262K tokens
Llama 3.3 Nemotron 49B	$0.10	$0.40	131K tokens

Cost by Use Case

Token counts vary dramatically by workload. A chat message uses 500 tokens. A coding agent tool call uses 50,000. Same model, 100x the cost.

Coding Agents

The most expensive LLM use case. A typical Claude Code session with Opus runs 20-50 tool calls, each sending 10K-100K tokens of accumulated context. Per-session cost: $3-15.

$3-15

Per coding session

$150-750

Per day (10 devs)

$3K-15K

Per month (10 devs)

70-85%

Input token share

The cost driver is not the model's output. It's the context. Each tool call re-sends the full conversation history plus retrieved file contents. By mid-session, you're paying frontier prices on 80K+ tokens of context per call, most of which the model already saw.

RAG (Retrieval-Augmented Generation)

Moderate cost per query, but volume adds up. A typical RAG pipeline retrieves 5-10 chunks of ~500 tokens each, plus the user query and system prompt. Total: 5,000-10,000 input tokens per query.

Model	Cost/Query	Monthly Cost
Claude Sonnet 4.6	$0.015-0.030	$450-900
GPT-5.4	$0.013-0.025	$375-750
Gemini 2.5 Flash	$0.002-0.003	$45-90
DeepSeek V3.2	$0.001-0.003	$30-75

Chat and Customer Support

Average conversation: 5-10 turns, 500-2,000 tokens per turn of accumulated context. Total per conversation: 10,000-30,000 input tokens.

Model	Cost/Conversation	Monthly Cost
GPT-5.4 Mini	$0.008-0.022	$225-660
Claude Haiku 4.5	$0.010-0.030	$300-900
Gemini 2.5 Flash	$0.003-0.009	$90-270
Mistral Small	$0.002-0.005	$45-150

Cost Optimization Strategies

Three strategies account for most achievable cost reduction. In order of impact:

1. Context Compression

Reduce input tokens by 50-70% before they reach the model. The single biggest lever because input tokens dominate total spend. Morph Compact preserves semantic content while stripping redundancy.

2. Model Routing

Send each task to the cheapest model that can handle it. Classification to Nano ($0.20/M), coding to Sonnet ($3/M), complex reasoning to Opus ($5/M). A 3-tier routing setup cuts costs 40-60% vs. using one model for everything.

3. Prompt Caching

Anthropic and OpenAI offer prompt caching that reduces repeated input costs by ~90%. If your system prompt is 2,000 tokens and you make 10,000 requests/day, caching saves ~$27/day on Sonnet alone.

Other Strategies

Batch API Processing

Both Anthropic and OpenAI offer 50% discounts on batch requests with 24-hour turnaround. Good for offline workloads like data processing, evaluation runs, and content generation.

Shorter Context Windows

Send only what the model needs. A coding agent that sends relevant functions instead of full files uses 80% fewer input tokens. Subagent architectures (search in separate context, return only results) compound this.

Context Compression Impact

Context compression is not a marginal optimization. It changes the economics of LLM usage at the architectural level. When input tokens are 70-85% of your bill and you can cut them by 60%, you are cutting total costs by 42-51%.

Before vs. After: Coding Agent

Metric	Without Compression	With Compact (60%)	Savings
Avg input tokens/call	80,000	32,000	60%
Cost per call (input)	$0.40	$0.16	$0.24
Calls per session (avg)	35	35	0
Sessions/dev/day	5	5	0
Monthly cost (input)	$21,000	$8,400	$12,600

Before vs. After: RAG Pipeline

Metric	Without Compression	With Compact (50%)	Savings
Avg input tokens/query	8,000	4,000	50%
Cost per query (input)	$0.024	$0.012	$0.012
Monthly cost (input)	$7,200	$3,600	$3,600

Compound effect

Context compression does more than cut costs. Shorter inputs mean faster response times (fewer tokens to process), higher rate limit headroom (fewer input tokens per minute consumed), and reduced context rot (less noise for the model to filter through). The cost savings are the most measurable benefit, but not the only one.

When to Use Which Model

Model selection is the second-biggest cost lever after context compression. The right model for the task can differ by 50-750x in price.

Task	Recommended Model	Input Cost/M	Why
Complex reasoning	Claude Opus 4.6	$5.00	Best accuracy on multi-step problems
Code generation	Claude Sonnet 4.6	$3.00	Near-Opus quality at 40% the cost
General coding	GPT-5.4 / GPT-5.1	$1.25-2.50	Strong coding, lower cost than Claude
Long-context analysis	Gemini 2.5 Pro	$1.25	1M context at mid-tier pricing
RAG queries	Gemini 2.5 Flash	$0.30	Fast, cheap, 1M context for large retrievals
Classification	GPT-5.4 Nano	$0.20	Sufficient accuracy, minimal cost
Summarization	Mistral Small	$0.15	Good quality at bottom-tier pricing
Cost-sensitive bulk	DeepSeek V3.2	$0.26	Best cost/quality for open-weight tasks

A practical approach: start every new task on the cheapest plausible model. Run 50-100 test cases. If accuracy is insufficient, move up one tier. Most teams find that 60-70% of their API calls can run on budget models without quality loss.

Monthly Cost Estimator

Use these reference points to estimate your monthly LLM spend. Multiply your daily request count by the per-request cost, then multiply by 30.

Avg Input Tokens	Opus 4.6 ($5/M)	Sonnet 4.6 ($3/M)	GPT-5.4 Mini ($0.75/M)	Flash ($0.30/M)
1,000 tokens	$0.005	$0.003	$0.001	$0.0003
5,000 tokens	$0.025	$0.015	$0.004	$0.002
10,000 tokens	$0.050	$0.030	$0.008	$0.003
50,000 tokens	$0.250	$0.150	$0.038	$0.015
100,000 tokens	$0.500	$0.300	$0.075	$0.030

Example: A team running 5,000 requests/day at 50K average input tokens on Sonnet 4.6 pays $0.15/request x 5,000 = $750/day = $22,500/month. With 60% context compression via Compact, that drops to $9,000/month. The same workload on Gemini 2.5 Flash (after compression): $2,250/month.

$22,500

Sonnet, no compression

$9,000

Sonnet + Compact

$4,500

Flash, no compression

$2,250

Flash + Compact

Frequently Asked Questions

How much does it cost to use Claude API?

Claude API pricing varies by model. Opus 4.6 costs $5/M input tokens and $25/M output tokens. Sonnet 4.6 costs $3/$15. Haiku 4.5 costs $1/$5. A typical API call costs between $0.001 and $0.50 depending on context size and model. Most production workloads average $0.01-0.05 per request on Sonnet.

How do I reduce LLM API costs?

Three strategies cover most of the achievable reduction. Context compression (50-70% input reduction), model routing (send simple tasks to cheap models), and prompt caching (~90% savings on repeated inputs). Together these can cut total API spend by 60-80%. Morph Compact handles the compression layer.

Which LLM is cheapest?

Among capable models: DeepSeek V3.2 at $0.26/M input, Llama 3.3 Nemotron at $0.10/M via API providers, Gemini 2.5 Flash Lite at $0.10/M, and GPT-5.4 Nano at $0.20/M. The cheapest option depends on whether you need reasoning quality (Flash or DeepSeek) or just classification (Nano).

How much do coding agents cost per month?

For a 10-person engineering team using frontier models: $3,000-15,000/month. The primary cost driver is accumulated context in the agent's conversation, not output tokens. Teams using context compression via Compact report 40-60% cost reductions on agent workloads.

What is the cost difference between GPT and Claude?

At the mid-tier: GPT-5.4 ($2.50/M input) is 17% cheaper than Claude Sonnet 4.6 ($3/M). At the frontier: Claude Opus 4.6 ($5/M input) is 83% cheaper than GPT-5.4 Pro ($30/M). Budget tiers are comparable: GPT-5.4 Mini ($0.75/M) vs. Claude Haiku 4.5 ($1/M). The right choice depends on your specific accuracy requirements, not just price.

How does context compression reduce LLM costs?

Context compression strips redundant tokens from input while preserving semantic meaning. A 100K-token context compressed by 60% becomes 40K tokens. At $5/M (Opus pricing), that saves $0.30 per request. Over 1,000 daily requests, that compounds to $9,000/month in savings. The compression happens before the API call, so you also get faster response times and higher effective rate limits.

Is it cheaper to self-host LLMs?

It depends on volume. Self-hosting eliminates per-token costs but adds GPU infrastructure. Running Llama 70B requires 2x A100 GPUs (~$2,160/month). This breaks even with API pricing at roughly 50,000-100,000 requests/month. Below that volume, APIs are cheaper. Above it, self-hosting wins on pure cost but adds operational complexity.

Cut Your LLM Costs by 60%

Morph Compact compresses context before it reaches the model. Same accuracy, 50-70% fewer input tokens, immediate cost reduction across any LLM API.

Try Compact

View API Docs

Morph Fast Apply

Morph WarpGrep

Morph Compact

Morph Glance

Morph MCP

Morph Monitor

Blog

Startup Credits

Students

Contact Us

About

Careers

LLM Cost Calculator: Compare API Pricing Across Every Major Model (2026)