Warp Grep Benchmarks
Fast agentic code search performance on real-world repositories
F1 Score Comparison
F1 measures precision × recall balance. Each system ran in its native harness with max 15 steps.
Surface queries: Agentic and semantic search perform similarly—semantic returns in ~5s.
Deep logic queries: Bug tracing, code paths, control flow—agentic shows 2x–6x better performance.
Warp Grep achieves 0.73 F1 in just 3.8 steps—3x fewer than comparable agentic approaches
Agent Capabilities Improvement
We ran the official SWE-bench evaluation with and without Warp Grep as the code search tool. All runs used Claude 4.5 Opus (20251101) as the base model.
The agent using Warp Grep consumed 39% fewer input tokens, required 26% fewer reasoning turns, and solved 10% more tasks—demonstrating that better search directly improves agent effectiveness.
Build the best coding agents today
Join 500+ teams using Morph to reduce token costs and apply edits at lightning speed.