LLM Cost Calculator: Compare API Pricing Across Every Major Model (2026)

Compare LLM API costs across Claude, GPT, Gemini, DeepSeek, Llama, and Mistral. Real pricing per million tokens, cost-per-task breakdowns, and strategies to cut your bill by 60%.

March 27, 2026 ยท 6 min read

A single Claude Opus 4 conversation with full 200K context costs $4.50. Run that 100 times a day and you're spending $13,500/month on one workflow. This page breaks down every major LLM's API pricing, shows what different workloads actually cost, and explains the strategies that cut bills by 60% or more.

$0.10
Cheapest input (per M tokens)
$75
Most expensive output (per M)
60%
Savings via context compression
750x
Price gap: cheapest to priciest

The Hidden Cost of Context

The per-token price is not what makes LLM APIs expensive. The context window is.

Every token you send as input gets billed whether or not the model uses it. A coding agent that sends a full 50,000-token file to make a 200-token edit pays for all 50,000 input tokens. The same edit with context compression sends 20,000 tokens. Same result, 60% lower cost.

This compounds fast in agentic workflows. A coding agent runs 20-50 tool calls per session, each call accumulating more context. By the 30th call, the agent might be sending 100K+ tokens per request. At Claude Opus 4.6 pricing ($5/M input), that single request costs $0.50. Multiply across a full session and you're at $3-15 per session, per developer.

The 80/20 of LLM costs

Input tokens typically account for 70-85% of total API spend. Output tokens are expensive per-unit but low-volume. Cutting input token count is where the money is.

LLM API Pricing Comparison (March 2026)

Current per-million-token pricing for every major model, sourced from official API pricing pages. Prices shown are standard (non-batch, non-cached) rates.

Frontier Models

These models handle complex reasoning, novel code generation, and multi-step planning. Use them when accuracy matters more than cost.

ModelInputOutputContext Window
Claude Opus 4.6$5.00$25.001M tokens
Claude Opus 4$15.00$75.00200K tokens
GPT-5.4$2.50$15.001M tokens
GPT-5.4 Pro$30.00$180.001M tokens
Gemini 2.5 Pro$1.25$10.001M tokens

Mid-Tier Models

Best for general coding, analysis, content generation, and most production workloads. The sweet spot for cost vs. capability.

ModelInputOutputContext Window
Claude Sonnet 4.6$3.00$15.001M tokens
GPT-5.1$1.25$10.00400K tokens
Mistral Large$0.50$1.50262K tokens
Mistral Medium 3.1$0.40$2.00131K tokens
DeepSeek V3.2$0.26$0.38164K tokens

Budget Models

For classification, extraction, summarization, and high-volume tasks where cost per request needs to stay below $0.001.

ModelInputOutputContext Window
Claude Haiku 4.5$1.00$5.00200K tokens
GPT-5.4 Mini$0.75$4.50400K tokens
GPT-5.4 Nano$0.20$1.25400K tokens
Gemini 2.5 Flash$0.30$2.501M tokens
Gemini 2.5 Flash Lite$0.10$0.401M tokens
Mistral Small$0.15$0.60262K tokens
Llama 3.3 Nemotron 49B$0.10$0.40131K tokens

Cost by Use Case

Token counts vary dramatically by workload. A chat message uses 500 tokens. A coding agent tool call uses 50,000. Same model, 100x the cost.

Coding Agents

The most expensive LLM use case. A typical Claude Code session with Opus runs 20-50 tool calls, each sending 10K-100K tokens of accumulated context. Per-session cost: $3-15.

$3-15
Per coding session
$150-750
Per day (10 devs)
$3K-15K
Per month (10 devs)
70-85%
Input token share

The cost driver is not the model's output. It's the context. Each tool call re-sends the full conversation history plus retrieved file contents. By mid-session, you're paying frontier prices on 80K+ tokens of context per call, most of which the model already saw.

RAG (Retrieval-Augmented Generation)

Moderate cost per query, but volume adds up. A typical RAG pipeline retrieves 5-10 chunks of ~500 tokens each, plus the user query and system prompt. Total: 5,000-10,000 input tokens per query.

ModelCost/QueryMonthly Cost
Claude Sonnet 4.6$0.015-0.030$450-900
GPT-5.4$0.013-0.025$375-750
Gemini 2.5 Flash$0.002-0.003$45-90
DeepSeek V3.2$0.001-0.003$30-75

Chat and Customer Support

Average conversation: 5-10 turns, 500-2,000 tokens per turn of accumulated context. Total per conversation: 10,000-30,000 input tokens.

ModelCost/ConversationMonthly Cost
GPT-5.4 Mini$0.008-0.022$225-660
Claude Haiku 4.5$0.010-0.030$300-900
Gemini 2.5 Flash$0.003-0.009$90-270
Mistral Small$0.002-0.005$45-150

Cost Optimization Strategies

Three strategies account for most achievable cost reduction. In order of impact:

1. Context Compression

Reduce input tokens by 50-70% before they reach the model. The single biggest lever because input tokens dominate total spend. Morph Compact preserves semantic content while stripping redundancy.

2. Model Routing

Send each task to the cheapest model that can handle it. Classification to Nano ($0.20/M), coding to Sonnet ($3/M), complex reasoning to Opus ($5/M). A 3-tier routing setup cuts costs 40-60% vs. using one model for everything.

3. Prompt Caching

Anthropic and OpenAI offer prompt caching that reduces repeated input costs by ~90%. If your system prompt is 2,000 tokens and you make 10,000 requests/day, caching saves ~$27/day on Sonnet alone.

Other Strategies

Batch API Processing

Both Anthropic and OpenAI offer 50% discounts on batch requests with 24-hour turnaround. Good for offline workloads like data processing, evaluation runs, and content generation.

Shorter Context Windows

Send only what the model needs. A coding agent that sends relevant functions instead of full files uses 80% fewer input tokens. Subagent architectures (search in separate context, return only results) compound this.

Context Compression Impact

Context compression is not a marginal optimization. It changes the economics of LLM usage at the architectural level. When input tokens are 70-85% of your bill and you can cut them by 60%, you are cutting total costs by 42-51%.

Before vs. After: Coding Agent

MetricWithout CompressionWith Compact (60%)Savings
Avg input tokens/call80,00032,00060%
Cost per call (input)$0.40$0.16$0.24
Calls per session (avg)35350
Sessions/dev/day550
Monthly cost (input)$21,000$8,400$12,600

Before vs. After: RAG Pipeline

MetricWithout CompressionWith Compact (50%)Savings
Avg input tokens/query8,0004,00050%
Cost per query (input)$0.024$0.012$0.012
Monthly cost (input)$7,200$3,600$3,600

Compound effect

Context compression does more than cut costs. Shorter inputs mean faster response times (fewer tokens to process), higher rate limit headroom (fewer input tokens per minute consumed), and reduced context rot (less noise for the model to filter through). The cost savings are the most measurable benefit, but not the only one.

When to Use Which Model

Model selection is the second-biggest cost lever after context compression. The right model for the task can differ by 50-750x in price.

TaskRecommended ModelInput Cost/MWhy
Complex reasoningClaude Opus 4.6$5.00Best accuracy on multi-step problems
Code generationClaude Sonnet 4.6$3.00Near-Opus quality at 40% the cost
General codingGPT-5.4 / GPT-5.1$1.25-2.50Strong coding, lower cost than Claude
Long-context analysisGemini 2.5 Pro$1.251M context at mid-tier pricing
RAG queriesGemini 2.5 Flash$0.30Fast, cheap, 1M context for large retrievals
ClassificationGPT-5.4 Nano$0.20Sufficient accuracy, minimal cost
SummarizationMistral Small$0.15Good quality at bottom-tier pricing
Cost-sensitive bulkDeepSeek V3.2$0.26Best cost/quality for open-weight tasks

A practical approach: start every new task on the cheapest plausible model. Run 50-100 test cases. If accuracy is insufficient, move up one tier. Most teams find that 60-70% of their API calls can run on budget models without quality loss.

Monthly Cost Estimator

Use these reference points to estimate your monthly LLM spend. Multiply your daily request count by the per-request cost, then multiply by 30.

Avg Input TokensOpus 4.6 ($5/M)Sonnet 4.6 ($3/M)GPT-5.4 Mini ($0.75/M)Flash ($0.30/M)
1,000 tokens$0.005$0.003$0.001$0.0003
5,000 tokens$0.025$0.015$0.004$0.002
10,000 tokens$0.050$0.030$0.008$0.003
50,000 tokens$0.250$0.150$0.038$0.015
100,000 tokens$0.500$0.300$0.075$0.030

Example: A team running 5,000 requests/day at 50K average input tokens on Sonnet 4.6 pays $0.15/request x 5,000 = $750/day = $22,500/month. With 60% context compression via Compact, that drops to $9,000/month. The same workload on Gemini 2.5 Flash (after compression): $2,250/month.

$22,500
Sonnet, no compression
$9,000
Sonnet + Compact
$4,500
Flash, no compression
$2,250
Flash + Compact

Frequently Asked Questions

How much does it cost to use Claude API?

Claude API pricing varies by model. Opus 4.6 costs $5/M input tokens and $25/M output tokens. Sonnet 4.6 costs $3/$15. Haiku 4.5 costs $1/$5. A typical API call costs between $0.001 and $0.50 depending on context size and model. Most production workloads average $0.01-0.05 per request on Sonnet.

How do I reduce LLM API costs?

Three strategies cover most of the achievable reduction. Context compression (50-70% input reduction), model routing (send simple tasks to cheap models), and prompt caching (~90% savings on repeated inputs). Together these can cut total API spend by 60-80%. Morph Compact handles the compression layer.

Which LLM is cheapest?

Among capable models: DeepSeek V3.2 at $0.26/M input, Llama 3.3 Nemotron at $0.10/M via API providers, Gemini 2.5 Flash Lite at $0.10/M, and GPT-5.4 Nano at $0.20/M. The cheapest option depends on whether you need reasoning quality (Flash or DeepSeek) or just classification (Nano).

How much do coding agents cost per month?

For a 10-person engineering team using frontier models: $3,000-15,000/month. The primary cost driver is accumulated context in the agent's conversation, not output tokens. Teams using context compression via Compact report 40-60% cost reductions on agent workloads.

What is the cost difference between GPT and Claude?

At the mid-tier: GPT-5.4 ($2.50/M input) is 17% cheaper than Claude Sonnet 4.6 ($3/M). At the frontier: Claude Opus 4.6 ($5/M input) is 83% cheaper than GPT-5.4 Pro ($30/M). Budget tiers are comparable: GPT-5.4 Mini ($0.75/M) vs. Claude Haiku 4.5 ($1/M). The right choice depends on your specific accuracy requirements, not just price.

How does context compression reduce LLM costs?

Context compression strips redundant tokens from input while preserving semantic meaning. A 100K-token context compressed by 60% becomes 40K tokens. At $5/M (Opus pricing), that saves $0.30 per request. Over 1,000 daily requests, that compounds to $9,000/month in savings. The compression happens before the API call, so you also get faster response times and higher effective rate limits.

Is it cheaper to self-host LLMs?

It depends on volume. Self-hosting eliminates per-token costs but adds GPU infrastructure. Running Llama 70B requires 2x A100 GPUs (~$2,160/month). This breaks even with API pricing at roughly 50,000-100,000 requests/month. Below that volume, APIs are cheaper. Above it, self-hosting wins on pure cost but adds operational complexity.

Cut Your LLM Costs by 60%

Morph Compact compresses context before it reaches the model. Same accuracy, 50-70% fewer input tokens, immediate cost reduction across any LLM API.