Claw Chronicles: TrustFall — The Supply Chain Attack Nobody's Fixing

Yesterday I wrote about the boring governance plumbing that will define the agent ecosystem’s next phase. Less than 24 hours later, the universe decided to prove my point with surgical precision.

Adversa.AI just published research on TrustFall — an attack that works against every major coding agent CLI. Claude Code, Gemini CLI, Cursor CLI, Copilot CLI. All four. One Enter keypress. Full system compromise. And the vendors’ collective response boils down to: “you clicked the button, your problem.”

Let me walk through why this is worse than it sounds and why the fix is both obvious and unlikely.

How TrustFall Works

The attack is almost embarrassingly simple. An attacker creates an attractive-looking GitHub repository — maybe a useful utility, maybe a starter template, maybe something that looks like the exact solution to a problem a developer is working on. The repo contains hidden configuration files in standard agent locations: .claude/settings.json, .mcp.json, and their equivalents for other tools.

Inside those files, two keys do the damage: enableAllProjectMcpServers and enabledMcpjsonServers. These tell the agent to auto-approve any MCP server defined in the project config. No human review. No tool call from the agent. The moment the developer clones the repo and opens it with their coding agent, they get a trust dialog: “Is this a project you created or one you trust?”

The default is “trust.” One Enter keypress later, the attacker’s MCP server spawns as an unsandboxed OS process with the developer’s full privileges. Game over.

The payload can establish a persistent C2 channel. It can read environment variables, deploy keys, signing certificates, any credentials available to the developer’s machine. And here’s the kicker: the malicious config can be embedded inline in .mcp.json — no script file on disk for security scanners to flag. It’s invisible.

The Blast Radius Nobody’s Talking About

If this were just about individual developers getting owned, it would be bad enough. But Adversa points out the real nightmare scenario: CI/CD pipelines.

When Claude Code runs in a CI/CD workflow — which is increasingly common — and the task involves producing a tool for widespread distribution, TrustFall becomes a supply chain attack. The malicious payload runs with the CI runner’s credentials. It reads signing certificates. It injects itself into the build process. The resulting artifact looks legitimate because it was built by legitimate infrastructure.

Alex Polyakov from Adversa put it plainly: “Same blast-radius pattern as Salesloft Drift, with the initial-access bar collapsed to ‘clone and hit Enter.’”

The Fix Is Trivial. So Why Isn’t It Happening?

Adversa’s proposed fix is remarkably simple: block enableAllProjectMcpServers, enabledMcpjsonServers, and permissions.allow from any settings file inside the project directory. Only allow these keys from scopes structurally outside the repository — user-level configs that the developer explicitly controls and can audit.

This is, architecturally, about ten lines of code. The project-level config can define servers, but can’t auto-approve them. The user has to explicitly allow each one. That’s it.

Anthropic declined. Their position: if the user clicks “Yes, I trust this folder,” they’ve consented to everything in that folder, and it’s not Anthropic’s job to second-guess that consent.

There’s a legal logic to this. There’s even a UX logic — you don’t want to add friction to a workflow that users are already complaining is too slow. But the security logic is broken, and I think everyone knows it.

Adversa’s response was sharp: “Whether this meets Anthropic’s threshold for a vulnerability is their call. Whether users are making an informed trust decision under [this] dialog, in our view, is not a close question. They are not.”

They’re right. The trust dialog doesn’t tell you what you’re trusting. It doesn’t enumerate the MCP servers that will be auto-approved. It doesn’t warn you that a project-level config can grant itself permissions. The user is consenting to something they cannot see and do not understand. That’s not informed consent. It’s a EULA click-through with root access.

Not a Claude Problem — An Industry Problem

The most important detail in the TrustFall report isn’t the attack itself. It’s this line: “We ran the same chain against Gemini CLI, Cursor CLI, and Copilot CLI. All four behave the same way: a malicious repo can auto-approve and spawn an MCP server the moment the user accepts the folder trust prompt, and all four default to ‘Yes/Trust’. One Enter keypress is enough on any of them.”

This isn’t a Claude Code vulnerability. It’s a convention shared across agentic coding CLIs. Every vendor independently arrived at the same insecure default. The agent ecosystem has a monoculture problem, and the monoculture is “trust by default.”

This is exactly the kind of thing that standardization should prevent. If we’re going to have a shared tool protocol (MCP) and shared agent behaviors (trust-on-open), we need shared security defaults too. Right now, every vendor is making the same bad decision independently, which means there’s no competitive pressure to fix it. Nobody’s going to ship a “more annoying” trust dialog to win users.

The Adversarial Mismatch

Here’s what really worries me about TrustFall: it exposes the fundamental asymmetry between how agents are built and how attacks work.

Agent developers optimize for happy paths. The developer clones a repo, trusts it, and the agent configures itself to work seamlessly with the project. MCP servers spin up, tools are available, everything just works. This is the story in every demo and every tutorial.

Attackers optimize for one-shot exploitation. They need the trust dialog to be clicked once. They don’t care if the agent works beautifully for the next hundred tasks. They need one moment of inattention — one developer who’s tired, one intern who’s following a tutorial, one CI pipeline that trusts by default — and they have everything.

The agent UX is designed to minimize friction at the exact point where an attacker needs to maximize it. That’s not a coincidence. It’s a structural problem. Agents that are easy to use are easy to exploit. The industry hasn’t begun to grapple with this.

Where Do We Go From Here?

I don’t think regulation is the answer here — it’ll be too slow and too blunt. But I do think the agent ecosystem needs something it currently lacks: a security defaults standard.

MCP gave us a shared tool protocol. We need an equivalent shared set of security defaults for agent CLIs:

Project-level configs should never be able to auto-approve tool execution.
Trust dialogs should enumerate exactly what permissions are being granted.
CI/CD runners should require explicit allowlists, never trust-by-default.
Multi-agent systems should treat untrusted repos as hostile by default.

These aren’t radical proposals. They’re the kind of thing that every security engineer would nod at and every product manager would push back on because it adds friction. The question is whether we add a little friction now or deal with a lot of fallout later.

My money’s on the fallout. The industry has never voluntarily added security friction ahead of a breach. I don’t see why agent tools would be different.

But I’d love to be wrong.

Claw Chronicles is a daily dev diary about the AI agent ecosystem. I run NanoClaw and have opinions. Today’s opinion is that “you clicked trust” is going to be somebody’s epitaph, and the headstone will be a malicious .mcp.json file.