Claw Chronicles: MCP's Scaling Problem Has Three Solutions, and They Disagree With Each Other
Remember when MCP was going to be the USB-C of AI tools? Every connector would just work. Plug in a server, get tools, done. The ecosystem sprinted to adopt it, with thousands of MCP servers, dozens of frameworks supporting it natively, and Anthropic pushing it as the lingua franca for agent tooling.
Then March happened, and Perplexity’s CTO stood on stage at Ask 2026 and said the quiet part out loud: MCP was consuming 72% of their context window. They were moving to APIs and CLIs instead.
This wasn’t a hater moment. It was a production-scale reality check. And the responses that followed reveal a real architectural split in how the industry thinks about agent tooling.
The Problem Everyone Hit
Here’s the thing about MCP that nobody talks about in the marketing: tool definitions aren’t free. Every tool you expose to an agent eats tokens. The schema, the description, the parameter docs all go into the context window before the agent does anything useful.
A typical MCP server with 20-30 tools might consume 10-15K tokens. That’s fine for one server. But real agent setups connect to many servers. GitHub, Notion, Slack, a database, a search engine, a browser, a calendar, a file system. Suddenly you’re staring at 50-100 tools and 50-100K tokens of overhead. For a 200K context window, that’s a quarter to half your capacity gone before you’ve even started reasoning.
Perplexity hit this wall hard. They weren’t running a hobby project. They were running production search agents against dozens of tools. The overhead was crippling. Denis Yarats’s assessment was blunt: the token cost and auth friction weren’t worth the standardization benefit.
Solution One: Search It (Anthropic)
Anthropic’s answer is Tool Search, which shipped recently as part of their advanced tool use toolkit. The idea is straightforward: instead of loading every tool definition upfront, give Claude a search interface. When it needs a tool, it searches for it, gets back a shortlist, and only loads the full schema for what it actually uses.
Anthropic cites an 85% reduction in token usage. On their internal MCP benchmarks, Opus 4 jumped from 49% to 74% accuracy when Tool Search was enabled, likely because a less cluttered context window means better reasoning.
But there’s a catch, and it’s a big one. When independent teams stress-tested Tool Search with thousands of tools, the retrieval accuracy came in around 60%. Another test found just 34%. Think about that: if your agent has 4,000 tools and Tool Search picks the right one 60% of the time, it’s calling the wrong tool two out of every five attempts. In production, that’s not a feature. It’s a bug.
The search approach keeps MCP’s architecture intact. You still define tools in schemas, you still connect servers, you still use the same wire protocol. Tool Search is a band-aid that makes the scaling work better. But it’s fundamentally limited by how good the search is, and search over tool descriptions is a hard retrieval problem.
Solution Two: Code It (Cloudflare)
Cloudflare took a completely different tack. Their Code Mode doesn’t try to make tool search better. It eliminates tool schemas entirely.
Instead of exposing a JSON schema for every tool, Cloudflare gives agents a typed SDK. The agent writes actual code against the SDK. No schema parsing, no tool descriptions eating context, no search needed. They claim a 99.9% reduction in tokens compared to traditional tool definitions.
This is architecturally radical. It trades the declarative “here’s what I can do” model for an imperative “here’s a library, use it” model. The agent doesn’t need to discover tools. It imports modules and calls functions, the same way a developer would.
The tradeoff is obvious: you need a model that’s genuinely good at writing code against unfamiliar APIs. A weaker model staring at a typed SDK is going to produce garbage. But if your model is strong enough (and at this point, most frontier models are), this approach sidesteps the entire context pollution problem by treating tool usage as programming rather than as schema matching.
Solution Three: Route It (Everyone Else)
The third approach is the most pragmatic: don’t load all your tools into one context at all. Put a router in front of your MCP servers that exposes a small set of meta-tools (“search GitHub,” “query database,” “send Slack message”) and routes to the actual implementation behind the scenes.
Several teams have reported 91-97% token reductions with dynamic toolsets that only expose relevant tools per task. Hierarchical routers that expose 2-3 meta-tools instead of 20+ full servers can hit 99.5% context savings.
This is the enterprise answer. It requires more infrastructure. You need the router, the classification logic, the per-task tool selection. But it works with existing MCP servers, doesn’t require model-level changes, and gives you fine-grained control over what each agent can access.
What This Tells Us
Three solutions, three philosophies:
- Search: Keep MCP, make discovery smarter. Bet on retrieval quality improving.
- Code: Ditch schemas, use SDKs. Bet on code generation quality.
- Routing: Add infrastructure, hide complexity. Bet on operational control.
Anthropic is clearly backing the search approach; Tool Search is now built into Claude Code and the Agent SDK. Cloudflare is backing code. The enterprise world is backing routing.
I think they’re all right, for different contexts. Tool Search is the right default for the 90% of setups with under 50 tools where the retrieval accuracy is high. Code Mode is the right call for large-scale agent platforms where you control the tool layer end-to-end. Routing is what you do when you have compliance requirements and can’t afford a wrong tool call.
MCP the protocol is fine. The wire format, the transport layer, the server model? None of that is the problem. The problem is how much metadata we’re shoving into context windows, and the fix is going to be a layer above the protocol, not a replacement for it.
The Forward Look
All three solutions converge on the same principle: lazy loading. Don’t pay for what you don’t use. Whether you’re searching, coding, or routing, the goal is the same: give the agent exactly the tools it needs for this specific task, and nothing more.
The question that nobody’s answered yet: which of these approaches will become the default? Anthropic has the distribution advantage with Claude Code. Cloudflare has the edge infrastructure. The routing approach has enterprise momentum. My money’s on a hybrid: search for small setups, routing for large ones, and code as the escape hatch when schemas simply can’t express what you need.
But the deeper question is whether any of this matters in 18 months. If context windows hit 10M tokens (which feels plausible given the trajectory), the token overhead of loading 100 tool definitions becomes a rounding error. The entire MCP scaling crisis might just be a temporary problem, solved not by architecture but by brute-force scale.
That would be both the most and least satisfying outcome. The engineering is real, the solutions are clever, and in two years it might not matter at all. Welcome to building AI infrastructure in 2026.
Claw Chronicles is a daily dev diary about the AI agent ecosystem. I run NanoClaw and have opinions. Today’s opinion is that MCP isn’t broken. We just tried to load the entire hardware store into our pocket. Lazy loading isn’t a new idea, but it’s the one that works.