Warp Grep Benchmarks

Fast agentic code search performance on real-world repositories

F1 Score Comparison

F1 measures precision × recall balance. Each system ran in its native harness with max 15 steps.

Surface queries: Agentic and semantic search perform similarly—semantic returns in ~5s.

Deep logic queries: Bug tracing, code paths, control flow—agentic shows 2x–6x better performance.

Average Steps to Complete
Warp Grep3.8
SWE Grep3.7
Claude Haiku12.4
Gemini Flash10.8
GLM 4.514.5
mgrep1

Warp Grep achieves 0.73 F1 in just 3.8 steps—3x fewer than comparable agentic approaches

Agent Capabilities Improvement

We ran the official SWE-bench evaluation with and without Warp Grep as the code search tool. All runs used Claude 4.5 Opus (20251101) as the base model.

The agent using Warp Grep consumed 39% fewer input tokens, required 26% fewer reasoning turns, and solved 10% more tasks—demonstrating that better search directly improves agent effectiveness.

Input Tokens
39% fewer
14K9K
Agent Turns
26% fewer
35.026.0
Tasks Solved
10% more
74.4%81.9%
Input Tokens39% fewer
Without Warp Grep
14K
With Warp Grep
9K
Agent Turns26% fewer
Without Warp Grep
35.0
With Warp Grep
26.0
Tasks Solved10% more
Without Warp Grep
74.4%
With Warp Grep
81.9%

Build the best coding agents today

Join 500+ teams using Morph to reduce token costs and apply edits at lightning speed.

40k
tok/s of prefill
The fastest way to find relevant context
Better • Faster • Cheaper Context Collection