We build AI infrastructure. Model routing, context compaction, code search. Our revenue depends on AI adoption growing. So when we say the AI bubble is real, understand that we have every incentive to say the opposite. But the data doesn't support the hype, and pretending otherwise doesn't help anyone, including us.
Yes, AI Is Overhyped
The gap between AI product demos and production reality is the defining feature of the current AI market. Vendors claim 40-70% developer productivity gains. Independent research tells a different story.
Bain's 2025 Technology Report surveyed software firms that have deployed generative AI tools. Two-thirds of firms have rolled them out. The measured productivity gains: 10-15%. Not 40%. Not 70%. And even those modest gains often don't translate into faster shipping because writing code accounts for only 25-35% of the development lifecycle. The bottleneck moves, not disappears.
The METR study is more damning. In a randomized controlled trial with 16 experienced open-source developers across 246 real tasks, developers using AI tools (Cursor Pro with Claude 3.5/3.7 Sonnet) completed tasks 19% slower than without AI. The developers themselves believed AI made them 20% faster. Perception and reality moved in opposite directions.
CodeRabbit analyzed 470 GitHub pull requests and found AI-generated code produces 1.7x more issues than human-written code. Logic and correctness bugs rose 75%. Security vulnerabilities rose 1.5-2x. Performance inefficiencies appeared 8x more often. Code readability problems increased more than 3x.
| Source | Claim or finding | Context |
|---|---|---|
| Vendor marketing | 40-70% productivity gain | Self-reported, controlled demos, junior tasks |
| Bain (2025) | 10-15% productivity gain | Survey of firms that deployed GenAI tools |
| METR (2025) | 19% slower with AI | RCT, 16 experienced devs, 246 real tasks |
| CodeRabbit (2025) | 1.7x more bugs in AI code | 470 GitHub PRs analyzed |
| Faros AI (2026) | 98% more PRs, 91% longer reviews | High AI adoption teams, net throughput flat |
| MIT/Copilot study | 26% more tasks completed | Gains concentrated in junior devs (27-39%) |
Addy Osmani, engineering lead at Google, called it the "70% problem": AI can produce 70% of a solution rapidly, but the remaining 30%, edge cases, security, production integration, is as time-consuming as it ever was. The 70% looks impressive in a demo. The 30% is where production software lives.
The perception gap is the core problem
METR's finding that developers believe they're 20% faster while actually being 19% slower is not a footnote. It's the mechanism behind the entire bubble. AI feels productive. It generates code fast. But the time spent reviewing, debugging, and fixing AI output often exceeds the time saved generating it. When productivity is measured by feelings instead of outcomes, every tool looks like a 10x improvement.
The AI Slop Problem Is Real
"Slop" was Merriam-Webster's 2025 Word of the Year. The American Dialect Society picked it too. The definition: digital content of low quality produced in quantity by means of artificial intelligence. Mentions of "AI slop" across the internet increased ninefold from 2024 to 2025, with negative sentiment hitting 54% in October 2025.
Slop is everywhere. Around 40% of videos recommended to children on YouTube appear to be AI-generated. A fake rock band amassed over 1 million Spotify plays in weeks. Nine of the top 100 fastest-growing YouTube channels feature AI-generated content like zombie football and cat soap operas. Amazon is flooded with AI-generated books. Google search results are polluted with AI-generated SEO content.
In software, slop takes a specific form: vibe coding. Andrej Karpathy coined the term in February 2025 for a development approach where you "give in to vibes, embrace exponentials, and forget the code even exists." The output can look functional in a demo. In production, it falls apart.
The numbers: an estimated 8,000+ startups that built production apps with vibe coding now need full or partial rebuilds, at $50K to $500K each. Total cleanup cost: $400 million to $4 billion. Forrester predicts 75% of technology decision-makers will face moderate to severe technical debt by 2026, up from 50% in 2025. Unmanaged AI-generated code drives maintenance costs to 4x traditional levels by year two.
This isn't anti-AI. It's anti-careless-AI. There's a difference between using AI with review, testing, and engineering discipline, and dumping AI output into production because it compiled.
9x increase in 'AI slop' mentions
From 2024 to 2025, online mentions of AI slop grew ninefold. Negative sentiment peaked at 54% in October 2025.
8,000+ startups need rebuilds
Vibe-coded production apps hitting the 'spaghetti point' around month 3, where adding features breaks existing ones. Cost: $50K-500K each.
4x maintenance costs by year 2
AI-generated code without governance accumulates technical debt that compounds. First-year costs already run 12% higher when factoring in review and churn.
History Rhymes: AI Winters
This isn't the first time AI has been overhyped. The pattern has played out twice before, and understanding those cycles clarifies what's different now, and what isn't.
The first AI winter (1974-1980). In the 1960s, researchers promised machines that could translate languages, understand speech, and match human reasoning within a decade. In 1973, the UK Parliament commissioned the Lighthill Report to evaluate AI research. The report called it an "utter failure" to achieve its "grandiose objectives." Funding dried up. DARPA cut AI research budgets. The field went quiet for nearly a decade.
The brief revival (1980-1987). Expert systems, rule-based programs that encoded domain knowledge, brought AI back. Corporations invested hundreds of millions. Japan launched its Fifth Generation Computer project. The market for AI hardware (Lisp machines) hit $400 million by 1986.
The second AI winter (1987-2000). Expert systems proved too expensive to maintain, too difficult to update, and unable to learn. The specialized AI hardware market collapsed as general-purpose computers became cheap enough to run the same software. Roger Schank and Marvin Minsky, two of AI's founding researchers, had warned during the 1980s boom that "enthusiasm for AI had spiraled out of control" and that "disappointment would certainly follow." It did. Two decades of reduced funding followed.
| Factor | 1980s expert systems | 2024-2026 generative AI |
|---|---|---|
| Promise | Encode all human knowledge in rules | Replace white-collar work with chatbots |
| Investment | $400M/yr in AI hardware | $539B projected capex in 2026 |
| Revenue gap | Systems too expensive to maintain | $12B consumer revenue vs $500B+ infrastructure spend |
| Real value | Some expert systems worked in narrow domains | Real gains in specific tasks (code completion, search, routing) |
| Warning signs | Minsky and Schank warned in 1985 | Altman says bubble is ongoing in 2025 |
| What survived | Statistical methods, neural network research | TBD: infrastructure layer looks durable |
The pattern is consistent: overpromise, massive capital influx, reality check, correction. What's different in 2026 is that AI has real commercial value in specific applications. Code completion saves time for junior developers. Model routing cuts API costs. Context compaction keeps agents coherent. These aren't promises. They're deployed systems with measurable results. The question is whether the correction kills the real value along with the hype, or whether the market learns to distinguish between the two.
The dot-com parallel
The dot-com crash destroyed $5 trillion in market value between 2000 and 2002. It killed Pets.com, Webvan, and hundreds of companies with no revenue model. It did not kill the internet. Amazon, launched in 1994, survived the crash and grew into a $2 trillion company. The infrastructure layer (hosting, payments, search) was the part that survived because it solved real problems for real businesses. AI infrastructure is in the same position: boring, measurable, and useful regardless of whether chatbot wrappers survive.
Where AI Actually Delivers (The Boring Stuff)
The AI applications that work aren't the ones getting the most attention. They're not chatbots. They're not AGI. They're infrastructure components that do one thing, do it measurably, and save money doing it.
Model routing: send each request to the cheapest model that can handle it
Most AI applications send every request to their most expensive model. A simple formatting task goes to the same model as a complex reasoning problem. An LLM router classifies prompt difficulty in ~430ms and routes easy requests to cheap, fast models and hard requests to expensive, capable ones. The result: 40-70% cost reduction with under 2% quality loss on hard tasks. No model improvement needed. Just better allocation of existing resources.
This works because most requests are easy. In a typical coding agent session, 60-70% of prompts are simple (formatting, boilerplate, file reads) and 30-40% are complex (architecture decisions, bug fixes, multi-file refactors). Routing the easy 60-70% to a model that costs 10x less saves real money at scale.
Context compaction: keep agents coherent without burning tokens
Coding agents degrade as their context window fills up. By the time an agent has read 20 files, executed 15 tool calls, and processed their results, the context is 80% full of low-signal tokens. The agent starts forgetting earlier instructions, repeating work, and making contradictory decisions.
Flash Compact reduces context by 50-70% at 33,000 tok/s with zero hallucination. Every surviving sentence is copied verbatim from the original. No paraphrasing, no invented details. This isn't a summary. It's the same content with the noise removed. Agents that compact their context maintain coherent state for 2-3x longer sessions.
Code search: find the right files in seconds, not minutes
Cognition (the team behind Devin) measured that coding agents spend 60% of their time searching for context. Most of that search is wasteful: reading entire files to find one function, scanning directory trees, grepping for strings across repositories. WarpGrep achieves 0.73 F1 on SWE-bench code search, meaning it finds the right files and the right regions within those files. The agent spends less time searching and more time editing.
Model Router
Classifies prompt difficulty in ~430ms. Routes easy requests to cheap models, hard requests to capable ones. 40-70% cost savings, under 2% quality loss.
Flash Compact
33,000 tok/s context compaction. 50-70% token reduction. Zero hallucination. Every surviving sentence is verbatim from the original.
WarpGrep
0.73 F1 code search on SWE-bench. 8 parallel tool calls per turn, 4 turns, sub-6s. Agents find the right files instead of reading everything.
None of these require believing in AGI, replacing developers, or reinventing software engineering. They solve specific, measurable problems: cost, token efficiency, search accuracy. The ROI calculation fits on a napkin. That's what makes them durable.
Developer Fatigue Is Justified
Stack Overflow's 2025 survey found positive sentiment for AI tools dropped to 60%, down from over 70% in 2023 and 2024. For the first time, trust declined. More developers actively distrust the accuracy of AI tools (46%) than trust it (33%). Only 3% report "highly trusting" the output.
The fatigue isn't about AI being useless. It's about the signal-to-noise ratio in AI tooling being terrible. JavaScript framework fatigue was monthly. AI tool fatigue is daily. Developers face new coding assistants, chat interfaces, agent frameworks, MCP servers, context management tools, prompt libraries, and "swarm architectures" every week. Most ship in permanent beta. Half the documentation is wrong. APIs change biweekly. Features get deprecated in months.
The cognitive cost is real. Harvard Business Review documented "AI brain fry" in March 2026: mental fatigue from excessive use or oversight of AI tools beyond one's cognitive capacity. Developers report buzzing sensations, mental fog, difficulty focusing, and slower decision-making after extended AI oversight sessions. Using a small set of AI tools aligned with productivity gains. Adding more tools reduced those gains.
The subscription overhead compounds the problem. A developer in 2026 might pay for GitHub Copilot ($19/mo), Claude Pro ($20/mo), Cursor Pro ($20/mo), ChatGPT Plus ($20/mo), and a handful of specialized AI tools. That's $100+/month in AI subscriptions before considering the time spent evaluating which ones to keep. Many of these tools are thin wrappers around the same underlying models with different UI.
The result: developers are retreating to fewer, proven tools rather than chasing every new release. This is rational. The winning strategy in a noisy market is to pick tools that solve a specific problem with measurable impact and ignore everything else.
What Survives When the Bubble Pops
Economist Ruchir Sharma says the AI surge checks every box on his four-part bubble checklist: overinvestment, overvaluation, over-ownership, and over-leverage. Bill Gurley of Benchmark warns that the AI bubble is about to burst and a reset is coming. Sam Altman himself says the bubble is ongoing. A Time article from March 2026 argues we should prepare for an AI bubble now.
When (not if) the correction happens, the shakeout follows the same pattern as every previous technology bubble. Three categories of companies will exist:
| Category | Examples | Outcome |
|---|---|---|
| AI wrappers | Chatbot-as-a-product, AI-for-everything apps, thin API wrappers with UI | Die. No defensible value. The underlying model provider can replicate the product in a feature update. |
| AI hype plays | Companies that added 'AI' to their name/pitch for valuation, AI-generated content farms | Die. When investor sentiment shifts, the music stops. Revenue never materialized. |
| AI infrastructure | Model routing, context management, code search, cost optimization, deployment tooling | Survive. Measurable ROI. Customer demand exists regardless of hype cycle. The cheaper and faster the infrastructure, the more AI gets used. |
The infrastructure layer survives because it solves problems that get worse as AI adoption grows, not better. More AI usage means more API costs, more context management challenges, more code search overhead. Companies spending $100K/month on LLM APIs need routing to cut that to $30K-60K. Companies running agents for hours need compaction to keep them coherent. Companies with 10-million-line codebases need search that actually finds the right file.
These problems exist whether the AI bubble inflates further or pops tomorrow. A model router saves money regardless of whether GPT-5 ships on schedule. Context compaction keeps agents working regardless of which model they run. Code search finds files regardless of market sentiment.
The dot-com crash didn't slow down e-commerce. It cleared out companies that had no business model and left the ones solving real problems. The AI correction will do the same. The question for any AI product is: does this deliver measurable value that customers will pay for after the hype is gone?
Frequently Asked Questions
Is the AI bubble real?
Yes. Over $500 billion per year is projected for AI infrastructure spending in 2026-2027, while U.S. consumer AI revenue is around $12 billion annually. Economist Ruchir Sharma says the AI surge checks every box on his four-part bubble checklist: overinvestment, overvaluation, over-ownership, and over-leverage. Sam Altman stated in 2025 that he believes an AI bubble is ongoing. The question isn't whether a bubble exists but which parts of AI survive the correction.
Does AI actually improve developer productivity?
The evidence is mixed. Bain found software teams see 10-15% gains, not the 40-70% vendors claim. METR's randomized trial showed experienced developers were 19% slower with AI. CodeRabbit found 1.7x more bugs in AI code. Junior developers see larger gains (27-39% in the MIT/Copilot study) while senior developers see minimal improvement (8-13%). The headline numbers from AI vendors don't match independent research.
What is AI slop?
Merriam-Webster's 2025 Word of the Year. Defined as digital content of low quality produced in quantity by artificial intelligence. In software, it means vibe-coded apps that need $50K-500K rebuilds, AI-generated code with 1.7x more bugs, and technical debt that compounds at 4x traditional maintenance costs by year two.
Will there be another AI winter?
A full winter (complete funding collapse) is unlikely because AI has real commercial value in specific applications. A correction is more likely: chatbot wrappers and AI-for-everything startups fail, while infrastructure companies solving cost and accuracy problems survive. This mirrors the dot-com crash, which killed speculative companies but built the foundation for Amazon, Google, and the modern internet.
What parts of AI are not overhyped?
AI infrastructure with measurable ROI. Model routing cuts API costs 40-70%. Context compaction reduces tokens 50-70% at 33,000 tok/s. Code search achieves 0.73 F1 on SWE-bench. These deliver value whether the bubble inflates or pops because they're quantifiable: cost reduction, latency, accuracy.
Is AI developer fatigue real?
Yes. Stack Overflow's 2025 survey found positive sentiment for AI tools dropped to 60%. More developers distrust AI tool accuracy (46%) than trust it (33%). Harvard Business Review documented "AI brain fry" as a clinical phenomenon in 2026. The rational response is fewer, proven tools that solve specific problems, not more subscriptions.
Related Resources
AI Infrastructure That Delivers Results, Not Hype
Morph builds the infrastructure layer that survives the correction. Model Router: $0.001/request, 40-70% cost savings. Flash Compact: 33,000 tok/s, 50-70% token reduction. WarpGrep: 0.73 F1 code search. Measurable. Boring. Real.