Warp Grep Benchmarks

Fast agentic code search performance on real-world repositories

F1 Score Comparison

F1 measures precision × recall balance. Each system ran in its native harness with max 15 steps.

Surface queries: Agentic and semantic search perform similarly—semantic returns in ~5s.

Deep logic queries: Bug tracing, code paths, control flow—agentic shows 2x–6x better performance.

Average Steps to Complete

Warp Grep3.8

SWE Grep3.7

Claude Haiku12.4

Gemini Flash10.8

GLM 4.514.5

mgrep1

Warp Grep achieves 0.73 F1 in just 3.8 steps—3x fewer than comparable agentic approaches

Agent Capabilities Improvement

We ran the official SWE-bench evaluation with and without Warp Grep as the code search tool. All runs used Claude 4.5 Opus (20251101) as the base model.

The agent using Warp Grep consumed 39% fewer input tokens, required 26% fewer reasoning turns, and solved 10% more tasks—demonstrating that better search directly improves agent effectiveness.

Input Tokens

39% fewer

14K→9K

Agent Turns

26% fewer

35.0→26.0

Tasks Solved

10% more

74.4%→81.9%

Input Tokens39% fewer

Without Warp Grep

14K

With Warp Grep

Agent Turns26% fewer

Without Warp Grep

35.0

With Warp Grep

26.0

Tasks Solved10% more

Without Warp Grep

74.4%

With Warp Grep

81.9%

Build the best coding agents today

Join 500+ teams using Morph to reduce token costs and apply edits at lightning speed.

Get Started Book a Call

Learn More

40k

tok/s of prefill

The fastest way to find relevant context

Better • Faster • Cheaper Context Collection