Compacting Agent History
Most compaction methods trade accuracy for compression. Task-aware compaction breaks the tradeoff.
LongCodeBench · Long Code QA · 108 repos · contexts up to 1M tokens
How much can you compress without losing accuracy?
ACCURACY ↑
59%57%55%53%51%49%
↗ more compression, higher accuracy
full context baseline
Full Context
RAG
Selective-Ctx
LongCodeZip
LLMLingua2
Claude Code
Codex
Flash-Compact
14.84× · 58.71%
0×3×6×9×12×15×
COMPRESSION RATIO→ more is better
Compacting Tool Calls
Agents solve problems faster when they operate on cleaner context.
Claude Opus 4.6 · SWE-bench Pro · same pass rate
Fewer tokens per task, fewer rounds to solve
Total Tokens
Agent Rounds
w/o Flash Compact
w/ Flash Compact
Does Compaction Make Sense For You?
Plug in your agent's numbers. The math does the rest.
1K5K20K
Agent response + user input + tool calls
10%50% (100K)90%
200K context window · 5× compression · 100 steps
No Compaction
$52.13
500K final context
With Compaction
$23.98
via Gemini Flash
Saved
$28.14
54% reduction
Context Size Over Steps
No compact
With compact
Cumulative Cost Over Steps
No compact
With compact
Assumptions▼
Claude Opus 4.6: $15/M input (fresh), $1.5/M input (cached), $75/M output
Gemini Flash: $0.15/M input, $0.6/M output
5× compression ratio · 1000 output tokens/turn
200K context window · 100 fixed steps
Cut your agent's token bill
Morph Compaction is available as an API. Drop it into any agent framework in under 10 lines.