Current Model Pricing (March 2026)
All prices per million tokens (MTok). Verified against Anthropic's official pricing page on March 5, 2026.
| Model | Input (per MTok) | Output (per MTok) | Status |
|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 | Current flagship |
| Claude Opus 4.5 | $5.00 | $25.00 | Active |
| Claude Opus 4.1 | $15.00 | $75.00 | Active (legacy pricing) |
| Claude Sonnet 4.6 | $3.00 | $15.00 | Current mid-tier |
| Claude Sonnet 4.5 | $3.00 | $15.00 | Active |
| Claude Sonnet 4 | $3.00 | $15.00 | Active |
| Claude Haiku 4.5 | $1.00 | $5.00 | Current budget |
| Claude Haiku 3.5 | $0.80 | $4.00 | Active |
| Claude Haiku 3 | $0.25 | $1.25 | Active (cheapest) |
Legacy Pricing Gap
Opus 4.1 and earlier Opus models cost $15/$75 per MTok, which is 3x the input and 3x the output price of Opus 4.6. If you are still using Opus 4.1 or Opus 4 for production workloads, switching to Opus 4.6 cuts your bill by 67% with no capability regression.
The Sonnet line has been price-stable since Claude 3: every version from Sonnet 3.7 through Sonnet 4.6 costs $3/$15. Haiku pricing varies more. Haiku 3 at $0.25/$1.25 is 4x cheaper than Haiku 4.5 at $1/$5, but Haiku 4.5 handles substantially more complex tasks.
Batch Processing Discount
The Batch API provides a flat 50% discount on all token costs. You submit requests asynchronously, and Anthropic processes them within a 24-hour window.
| Model | Batch Input (per MTok) | Batch Output (per MTok) |
|---|---|---|
| Claude Opus 4.6 | $2.50 | $12.50 |
| Claude Sonnet 4.6 | $1.50 | $7.50 |
| Claude Haiku 4.5 | $0.50 | $2.50 |
| Claude Haiku 3 | $0.125 | $0.625 |
Haiku 3 via Batch API costs $0.125 per million input tokens. That is 40x cheaper than standard Opus 4.6 input pricing. For workloads that tolerate 24-hour turnaround (data processing, bulk classification, offline analysis), batch Haiku is the cheapest way to use Claude.
Prompt Caching
Prompt caching stores repeated portions of your prompt so subsequent requests read from cache at 10% of the base input price. Two cache durations exist.
| Operation | Multiplier | Duration | Break-even |
|---|---|---|---|
| 5-minute cache write | 1.25x base input | 5 minutes | 1 cache hit |
| 1-hour cache write | 2x base input | 1 hour | 2 cache hits |
| Cache hit (read) | 0.1x base input | Same as write | Immediate savings |
Concrete example: a 10,000-token system prompt on Opus 4.6. Standard cost per request: $0.05 input. With 5-minute caching, the first request costs $0.0625 (write), and every subsequent request within 5 minutes costs $0.005 (hit). After one cache hit, you are saving money. After 10 requests, you have spent $0.1075 instead of $0.50.
Automatic Caching
Add a single cache_control field at the top level of your request, and Anthropic automatically manages cache breakpoints as conversations grow. For fine-grained control, place cache_control on individual content blocks.
Long Context Pricing
Claude Opus 4.6, Sonnet 4.6, Sonnet 4.5, and Sonnet 4 support a 1M token context window (beta, Tier 4+ organizations). Requests exceeding 200K input tokens are charged at premium rates.
| Model | Standard (≤200K) | Long Context (>200K) |
|---|---|---|
| Opus 4.6 Input | $5.00 / MTok | $10.00 / MTok |
| Opus 4.6 Output | $25.00 / MTok | $37.50 / MTok |
| Sonnet 4.6/4.5/4 Input | $3.00 / MTok | $6.00 / MTok |
| Sonnet 4.6/4.5/4 Output | $15.00 / MTok | $22.50 / MTok |
The 200K threshold is based on total input tokens, including cache reads and writes. Output token count does not affect which tier applies. If your total input hits 200,001 tokens, the entire request uses premium pricing, not just the overflow.
Check Your API Response
Sum input_tokens + cache_creation_input_tokens + cache_read_input_tokens in the response usage object. If the total exceeds 200,000, you were billed at long-context rates.
Fast Mode (Opus 4.6 Only)
Fast mode is a research preview that provides significantly faster output from Opus 4.6 at 6x the standard rate. The tradeoff is straightforward: $30/$150 instead of $5/$25. Fast mode includes the full 1M context window with no additional long-context surcharge.
Fast mode stacks with prompt caching and data residency multipliers, but is not available with the Batch API. If you need both speed and cost efficiency, standard Sonnet 4.6 at $3/$15 may be a better fit than Opus fast mode at $30/$150.
Third-Party Platform Pricing
Claude models are available on AWS Bedrock, Google Vertex AI, and Microsoft Foundry. Starting with Claude Sonnet 4.5 and Haiku 4.5, these platforms offer two endpoint types:
| Endpoint Type | Routing | Price Premium |
|---|---|---|
| Global | Dynamic routing across regions | Standard pricing |
| Regional | Data stays in specific region | +10% premium |
The Claude API (direct from Anthropic) is global-only and uses standard pricing. The 10% regional premium only applies on third-party platforms. Earlier models (Sonnet 4, Opus 4, and prior) keep their existing pricing structure on all platforms.
Cost Optimization Guide
The four discount mechanisms stack multiplicatively. Here is how to layer them for maximum savings.
Batch + Prompt Caching on Haiku 3: $0.0125 per million cached input tokens.
| Configuration | Opus 4.6 | Sonnet 4.6 | Haiku 3 |
|---|---|---|---|
| Standard | $5.00 | $3.00 | $0.25 |
| Batch only | $2.50 | $1.50 | $0.125 |
| Cache hit only | $0.50 | $0.30 | $0.025 |
| Batch + cache hit | $0.25 | $0.15 | $0.0125 |
Opus 4.6 with batch processing and prompt caching costs $0.25 per million cached input tokens. That is the same base price as standard Haiku 3 input. For offline workloads with repeated context (document analysis, code review across a repository), this combination makes Opus-level reasoning accessible at Haiku-level pricing.
Right-size Your Model
Use Haiku for classification, extraction, and simple generation. Use Sonnet for code generation, summarization, and multi-step reasoning. Use Opus only for tasks where Sonnet measurably underperforms. Morph routes between models automatically based on task complexity, so you pay Haiku prices for easy tasks and Opus prices only when the task demands it.
FAQ
How much does the Claude API cost?
Opus 4.6: $5/$25 per MTok. Sonnet 4.6: $3/$15. Haiku 4.5: $1/$5. Haiku 3: $0.25/$1.25. All prices are per million tokens. Batch processing gives 50% off across all models.
What is the cheapest Claude API model?
Claude Haiku 3 at $0.25 input / $1.25 output per million tokens. With batch processing, that drops to $0.125/$0.625. Haiku 3 handles classification, extraction, and simple generation well.
How does prompt caching save money?
Cache hits cost 10% of the standard input price. A 5-minute cache write costs 1.25x, so after just one cache hit you break even. For applications with consistent system prompts, caching cuts input costs by up to 90%.
What is Claude fast mode?
Fast mode for Opus 4.6 costs $30/$150 per MTok (6x standard). It provides faster output and includes the full 1M context window with no long-context surcharge. Available as a research preview.
Does Claude charge more for long context?
Yes. Requests exceeding 200K input tokens use premium pricing: 2x input, 1.5x output. This applies to Opus 4.6, Sonnet 4.6, 4.5, and 4. Requests under 200K use standard rates even with 1M context enabled.
Is Claude cheaper than OpenAI?
Per-token, OpenAI's GPT-5 ($1.25/$10) and o3 ($2/$8) are cheaper than Claude Sonnet 4.6 ($3/$15) and Opus 4.6 ($5/$25). But effective cost depends on tokens consumed per task, and Claude often completes coding tasks in fewer turns. See our full pricing comparison.
Do discounts stack?
Yes. Batch (50% off), prompt caching (90% off cache hits), long-context premium, and data residency all stack multiplicatively. The cheapest possible Opus input cost is $0.25 per MTok (batch + cache hit).
What are token limits for each model?
Opus 4.6 and Sonnet 4.6: 200K default context, 1M beta. Haiku 4.5: 200K. Haiku 3: 200K. Output limits vary by model. Maximum output for Opus 4.6 is 128K tokens per response.
Route to the right Claude model automatically
Morph selects between Opus, Sonnet, and Haiku based on task complexity. You pay Haiku prices for easy tasks, Opus prices only when the problem demands it.