Did We Take a Wrong Turn on MCP?
April 15, 2026 · Restorative · 7 min read
MCP promised to standardize AI tool use. Instead, it turned deterministic operations into token-burning chitchat. Here's why the backlash is justified.
The complaint shows up constantly in AI communities: someone burns through their entire usage allowance in nineteen minutes. They added a few MCP servers, let the agent run, and watched the context window fill up with tool descriptions, handshakes, and responses to responses. Meanwhile I hit my usage limit maybe once a week, and I'm not being conservative about how I work.
That gap made me curious. Then I added one MCP server to my own setup and nearly hit a usage pause within minutes. That's when the question stopped being theoretical.
What MCP Was Supposed to Solve
The original promise was real and the problem it addressed was legitimate. Before MCP, every AI tool integration was a custom job. You wrote glue code, maintained it, rewrote it when the model changed, and repeated that for every tool in your stack. MCP offered a standard protocol: one interface, many tools, any model that speaks the spec.
For a moment it looked like infrastructure progress. A universal connector layer that would let AI agents reach into databases, APIs, file systems, and external services without bespoke integration work every time.
The standardization problem was worth solving. The solution, though, introduced a different problem that we're only now starting to name clearly.
The Token Cost Nobody Was Accounting For
Here's what actually happens when you wire up MCP servers to an agent. Every tool you expose gets described in the context. The agent reasons about which tool to call. It makes the call. The result comes back into context. The agent reasons about what to do with the result. If it needs to call another tool, the whole cycle repeats, and every step of that cycle is burning tokens.
For a single, well-scoped task this is manageable. But agents don't run single well-scoped tasks in isolation. They run chains of reasoning where each step generates more context that the next step has to process. Add a few MCP servers with verbose tool descriptions and you've turned what used to be a direct API call into a probabilistic conversation about whether to make that call, followed by another probabilistic conversation about what the result means.
Deterministic operations that served us well for decades, a database query, a file read, a search request, got wrapped in layers of language model inference. We took the fastest, most reliable parts of our stack and made them slower, more expensive, and less predictable.
Perplexity Walking Away Is Worth Paying Attention To
When Perplexity's engineering team started moving away from MCP internally in favor of direct API integrations, the AI community mostly shrugged. One company's internal architecture decision. But the reasoning behind that move matters more than the decision itself.
The core issue is that MCP optimizes for flexibility and standardization at the cost of efficiency and determinism. If your product depends on fast, reliable tool use at scale, that tradeoff stops making sense quickly. You end up paying model inference costs for decisions that don't need model inference. The agent doesn't need to reason about whether to query a database. You already know it needs to query the database. Just query it.
This isn't an edge case critique from an unusually demanding use case. This is the natural ceiling you hit when you take a protocol designed to make tool use accessible and try to run it in production at any meaningful scale.
The Context Window Bloat Problem
There's a subtler issue underneath the token cost, which is what context window bloat does to agent reasoning quality.
When you load multiple MCP servers, the tool descriptions alone can consume a significant portion of your available context before the agent has done anything. Then as the agent works, the results of tool calls accumulate. By the time you're three or four steps into a task, the model is reasoning from a context that's mostly scaffolding and intermediate results rather than the actual information relevant to finishing the job.
Models don't perform equally well across the full context window. Attention degrades. Important information from early in the context gets weighted less heavily. You end up with an agent that's technically capable of accessing everything it needs but practically worse at using that information than a simpler setup with less noise.
The people burning through usage in nineteen minutes aren't just paying more. They're also getting worse results for the extra cost.
This Is the Same Pattern Everywhere
I keep seeing this pattern across domains. A new capability arrives, it gets applied maximally because the capability itself is exciting, and the question of whether maximum application produces better outcomes gets skipped entirely.
At ChangeNOW this year I was thinking about exactly this while talking with people building AI tools for genuinely difficult problems. Jin from Peace Therapist is running an NGO helping people recover mentally from war. The tools she needs to build have to work reliably and efficiently because the stakes are real and the resources are limited. The idea of burning through compute budget on agent overhead that doesn't contribute to the outcome is not an abstract architectural concern for her, it's a direct constraint on what she can actually do.
I went to ChangeNOW specifically thinking about vibe coding and the tendency to build things without intention, just because the capability is there. Another habit tracker, another task list, another wrapper around a model that does something the existing tools already do. MCP servers feel like the infrastructure equivalent of that impulse. We added the connectors because we could. The question of whether the connected system produces better outcomes than a more direct approach came later, if it came at all.
What Better Looks Like
This isn't an argument against tool use or against giving agents access to external systems. It's an argument for being deliberate about how that access is structured.
Direct API calls where the path is known and deterministic. MCP where genuine flexibility and standardization across unknown or changing tools justifies the overhead. Explicit scoping of what tools an agent actually needs for a specific task rather than loading everything and letting the model sort it out.
The goal is to keep deterministic operations deterministic. Use language model inference for the parts of a task that genuinely require reasoning, judgment, or handling of ambiguity. Don't use it as a universal connector layer for operations that have reliable programmatic paths.
The engineers moving away from MCP aren't rejecting the idea of AI agents using tools. They're rejecting the assumption that every operation should route through probabilistic inference when a direct call would be faster, cheaper, and more reliable.
The Uncomfortable Question
MCP solved a real standardization problem and in doing so made tool use accessible to a much wider range of developers and use cases. That's not nothing. The protocol served a purpose.
But there's a meaningful difference between a protocol that makes experimentation accessible and a production architecture that scales. We grabbed MCP for both roles, and now the production costs are becoming visible in the form of nineteen-minute usage burndowns and degraded agent performance.
The shift away from MCP by people who've run it at scale isn't a rejection of AI agents. It's engineers learning what the protocol is actually good for and adjusting the architecture accordingly. That's how technology matures.
The question worth sitting with is whether we'll apply that learning intentionally or just keep adding servers until the costs force the correction.
Want the next post?
Leave your email and I’ll send you the next one. No spam, no nonsense.