Compaction Benchmarks

Token economics of multi-turn coding agents.

Compacting Agent History

Most compaction methods trade accuracy for compression. Task-aware compaction breaks the tradeoff.

LongCodeBench · Long Code QA · 108 repos · contexts up to 1M tokens
How much can you compress without losing accuracy?
ACCURACY ↑
59%57%55%53%51%49%
↗ more compression, higher accuracy
full context baseline
Full Context
RAG
Selective-Ctx
LongCodeZip
LLMLingua2
Claude Code
Codex
Flash-Compact
14.84× · 58.71%
0×3×6×9×12×15×
COMPRESSION RATIO→ more is better

Compacting Tool Calls

Agents solve problems faster when they operate on cleaner context.

Claude Opus 4.6 · SWE-bench Pro · same pass rate
Fewer tokens per task, fewer rounds to solve
Total Tokens
Agent Rounds
w/o Flash Compact
w/ Flash Compact

Does Compaction Make Sense For You?

Plug in your agent's numbers. The math does the rest.

1K5K20K
Agent response + user input + tool calls
10%50% (100K)90%
200K context window · 5× compression · 100 steps
No Compaction
$52.13
500K final context
With Compaction
$23.98
via Gemini Flash
Saved
$28.14
54% reduction
Context Size Over Steps
No compact
With compact
Cumulative Cost Over Steps
No compact
With compact
Assumptions
Claude Opus 4.6: $15/M input (fresh), $1.5/M input (cached), $75/M output
Gemini Flash: $0.15/M input, $0.6/M output
5× compression ratio · 1000 output tokens/turn
200K context window · 100 fixed steps

Cut your agent's token bill

Morph Compaction is available as an API. Drop it into any agent framework in under 10 lines.