A single Claude Opus 4 conversation with full 200K context costs $4.50. Run that 100 times a day and you're spending $13,500/month on one workflow. This page breaks down every major LLM's API pricing, shows what different workloads actually cost, and explains the strategies that cut bills by 60% or more.
LLM API Pricing Comparison (March 2026)
Current per-million-token pricing for every major model, sourced from official API pricing pages. Prices shown are standard (non-batch, non-cached) rates.
Frontier Models
These models handle complex reasoning, novel code generation, and multi-step planning. Use them when accuracy matters more than cost.
| Model | Input | Output | Context Window |
|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 | 1M tokens |
| Claude Opus 4 | $15.00 | $75.00 | 200K tokens |
| GPT-5.4 | $2.50 | $15.00 | 1M tokens |
| GPT-5.4 Pro | $30.00 | $180.00 | 1M tokens |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M tokens |
Mid-Tier Models
Best for general coding, analysis, content generation, and most production workloads. The sweet spot for cost vs. capability.
| Model | Input | Output | Context Window |
|---|---|---|---|
| Claude Sonnet 4.6 | $3.00 | $15.00 | 1M tokens |
| GPT-5.1 | $1.25 | $10.00 | 400K tokens |
| Mistral Large | $0.50 | $1.50 | 262K tokens |
| Mistral Medium 3.1 | $0.40 | $2.00 | 131K tokens |
| DeepSeek V3.2 | $0.26 | $0.38 | 164K tokens |
Budget Models
For classification, extraction, summarization, and high-volume tasks where cost per request needs to stay below $0.001.
| Model | Input | Output | Context Window |
|---|---|---|---|
| Claude Haiku 4.5 | $1.00 | $5.00 | 200K tokens |
| GPT-5.4 Mini | $0.75 | $4.50 | 400K tokens |
| GPT-5.4 Nano | $0.20 | $1.25 | 400K tokens |
| Gemini 2.5 Flash | $0.30 | $2.50 | 1M tokens |
| Gemini 2.5 Flash Lite | $0.10 | $0.40 | 1M tokens |
| Mistral Small | $0.15 | $0.60 | 262K tokens |
| Llama 3.3 Nemotron 49B | $0.10 | $0.40 | 131K tokens |
Cost by Use Case
Token counts vary dramatically by workload. A chat message uses 500 tokens. A coding agent tool call uses 50,000. Same model, 100x the cost.
Coding Agents
The most expensive LLM use case. A typical Claude Code session with Opus runs 20-50 tool calls, each sending 10K-100K tokens of accumulated context. Per-session cost: $3-15.
The cost driver is not the model's output. It's the context. Each tool call re-sends the full conversation history plus retrieved file contents. By mid-session, you're paying frontier prices on 80K+ tokens of context per call, most of which the model already saw.
RAG (Retrieval-Augmented Generation)
Moderate cost per query, but volume adds up. A typical RAG pipeline retrieves 5-10 chunks of ~500 tokens each, plus the user query and system prompt. Total: 5,000-10,000 input tokens per query.
| Model | Cost/Query | Monthly Cost |
|---|---|---|
| Claude Sonnet 4.6 | $0.015-0.030 | $450-900 |
| GPT-5.4 | $0.013-0.025 | $375-750 |
| Gemini 2.5 Flash | $0.002-0.003 | $45-90 |
| DeepSeek V3.2 | $0.001-0.003 | $30-75 |
Chat and Customer Support
Average conversation: 5-10 turns, 500-2,000 tokens per turn of accumulated context. Total per conversation: 10,000-30,000 input tokens.
| Model | Cost/Conversation | Monthly Cost |
|---|---|---|
| GPT-5.4 Mini | $0.008-0.022 | $225-660 |
| Claude Haiku 4.5 | $0.010-0.030 | $300-900 |
| Gemini 2.5 Flash | $0.003-0.009 | $90-270 |
| Mistral Small | $0.002-0.005 | $45-150 |
Cost Optimization Strategies
Three strategies account for most achievable cost reduction. In order of impact:
1. Context Compression
Reduce input tokens by 50-70% before they reach the model. The single biggest lever because input tokens dominate total spend. Morph Compact preserves semantic content while stripping redundancy.
2. Model Routing
Send each task to the cheapest model that can handle it. Classification to Nano ($0.20/M), coding to Sonnet ($3/M), complex reasoning to Opus ($5/M). A 3-tier routing setup cuts costs 40-60% vs. using one model for everything.
3. Prompt Caching
Anthropic and OpenAI offer prompt caching that reduces repeated input costs by ~90%. If your system prompt is 2,000 tokens and you make 10,000 requests/day, caching saves ~$27/day on Sonnet alone.
Other Strategies
Batch API Processing
Both Anthropic and OpenAI offer 50% discounts on batch requests with 24-hour turnaround. Good for offline workloads like data processing, evaluation runs, and content generation.
Shorter Context Windows
Send only what the model needs. A coding agent that sends relevant functions instead of full files uses 80% fewer input tokens. Subagent architectures (search in separate context, return only results) compound this.
Context Compression Impact
Context compression is not a marginal optimization. It changes the economics of LLM usage at the architectural level. When input tokens are 70-85% of your bill and you can cut them by 60%, you are cutting total costs by 42-51%.
Before vs. After: Coding Agent
| Metric | Without Compression | With Compact (60%) | Savings |
|---|---|---|---|
| Avg input tokens/call | 80,000 | 32,000 | 60% |
| Cost per call (input) | $0.40 | $0.16 | $0.24 |
| Calls per session (avg) | 35 | 35 | 0 |
| Sessions/dev/day | 5 | 5 | 0 |
| Monthly cost (input) | $21,000 | $8,400 | $12,600 |
Before vs. After: RAG Pipeline
| Metric | Without Compression | With Compact (50%) | Savings |
|---|---|---|---|
| Avg input tokens/query | 8,000 | 4,000 | 50% |
| Cost per query (input) | $0.024 | $0.012 | $0.012 |
| Monthly cost (input) | $7,200 | $3,600 | $3,600 |
Compound effect
Context compression does more than cut costs. Shorter inputs mean faster response times (fewer tokens to process), higher rate limit headroom (fewer input tokens per minute consumed), and reduced context rot (less noise for the model to filter through). The cost savings are the most measurable benefit, but not the only one.
When to Use Which Model
Model selection is the second-biggest cost lever after context compression. The right model for the task can differ by 50-750x in price.
| Task | Recommended Model | Input Cost/M | Why |
|---|---|---|---|
| Complex reasoning | Claude Opus 4.6 | $5.00 | Best accuracy on multi-step problems |
| Code generation | Claude Sonnet 4.6 | $3.00 | Near-Opus quality at 40% the cost |
| General coding | GPT-5.4 / GPT-5.1 | $1.25-2.50 | Strong coding, lower cost than Claude |
| Long-context analysis | Gemini 2.5 Pro | $1.25 | 1M context at mid-tier pricing |
| RAG queries | Gemini 2.5 Flash | $0.30 | Fast, cheap, 1M context for large retrievals |
| Classification | GPT-5.4 Nano | $0.20 | Sufficient accuracy, minimal cost |
| Summarization | Mistral Small | $0.15 | Good quality at bottom-tier pricing |
| Cost-sensitive bulk | DeepSeek V3.2 | $0.26 | Best cost/quality for open-weight tasks |
A practical approach: start every new task on the cheapest plausible model. Run 50-100 test cases. If accuracy is insufficient, move up one tier. Most teams find that 60-70% of their API calls can run on budget models without quality loss.
Monthly Cost Estimator
Use these reference points to estimate your monthly LLM spend. Multiply your daily request count by the per-request cost, then multiply by 30.
| Avg Input Tokens | Opus 4.6 ($5/M) | Sonnet 4.6 ($3/M) | GPT-5.4 Mini ($0.75/M) | Flash ($0.30/M) |
|---|---|---|---|---|
| 1,000 tokens | $0.005 | $0.003 | $0.001 | $0.0003 |
| 5,000 tokens | $0.025 | $0.015 | $0.004 | $0.002 |
| 10,000 tokens | $0.050 | $0.030 | $0.008 | $0.003 |
| 50,000 tokens | $0.250 | $0.150 | $0.038 | $0.015 |
| 100,000 tokens | $0.500 | $0.300 | $0.075 | $0.030 |
Example: A team running 5,000 requests/day at 50K average input tokens on Sonnet 4.6 pays $0.15/request x 5,000 = $750/day = $22,500/month. With 60% context compression via Compact, that drops to $9,000/month. The same workload on Gemini 2.5 Flash (after compression): $2,250/month.
Frequently Asked Questions
How much does it cost to use Claude API?
Claude API pricing varies by model. Opus 4.6 costs $5/M input tokens and $25/M output tokens. Sonnet 4.6 costs $3/$15. Haiku 4.5 costs $1/$5. A typical API call costs between $0.001 and $0.50 depending on context size and model. Most production workloads average $0.01-0.05 per request on Sonnet.
How do I reduce LLM API costs?
Three strategies cover most of the achievable reduction. Context compression (50-70% input reduction), model routing (send simple tasks to cheap models), and prompt caching (~90% savings on repeated inputs). Together these can cut total API spend by 60-80%. Morph Compact handles the compression layer.
Which LLM is cheapest?
Among capable models: DeepSeek V3.2 at $0.26/M input, Llama 3.3 Nemotron at $0.10/M via API providers, Gemini 2.5 Flash Lite at $0.10/M, and GPT-5.4 Nano at $0.20/M. The cheapest option depends on whether you need reasoning quality (Flash or DeepSeek) or just classification (Nano).
How much do coding agents cost per month?
For a 10-person engineering team using frontier models: $3,000-15,000/month. The primary cost driver is accumulated context in the agent's conversation, not output tokens. Teams using context compression via Compact report 40-60% cost reductions on agent workloads.
What is the cost difference between GPT and Claude?
At the mid-tier: GPT-5.4 ($2.50/M input) is 17% cheaper than Claude Sonnet 4.6 ($3/M). At the frontier: Claude Opus 4.6 ($5/M input) is 83% cheaper than GPT-5.4 Pro ($30/M). Budget tiers are comparable: GPT-5.4 Mini ($0.75/M) vs. Claude Haiku 4.5 ($1/M). The right choice depends on your specific accuracy requirements, not just price.
How does context compression reduce LLM costs?
Context compression strips redundant tokens from input while preserving semantic meaning. A 100K-token context compressed by 60% becomes 40K tokens. At $5/M (Opus pricing), that saves $0.30 per request. Over 1,000 daily requests, that compounds to $9,000/month in savings. The compression happens before the API call, so you also get faster response times and higher effective rate limits.
Is it cheaper to self-host LLMs?
It depends on volume. Self-hosting eliminates per-token costs but adds GPU infrastructure. Running Llama 70B requires 2x A100 GPUs (~$2,160/month). This breaks even with API pricing at roughly 50,000-100,000 requests/month. Below that volume, APIs are cheaper. Above it, self-hosting wins on pure cost but adds operational complexity.
Cut Your LLM Costs by 60%
Morph Compact compresses context before it reaches the model. Same accuracy, 50-70% fewer input tokens, immediate cost reduction across any LLM API.