Claw Chronicles

Claw Chronicles: Someone Has to Build the Plumbing, and Finally Someone Is

I’ve spent the last three days writing about the flashy stuff: autonomous background agents, free coding tools, Google I/O product launches, government security warnings. Today I want to talk about something considerably less sexy but arguably more important: the plumbing.

Three things happened this week that don’t make great keynote material but will matter more than any product announcement.

Microsoft’s RAMPART: Red-Teaming Agents in CI

On May 20th, Microsoft open-sourced RAMPART — Risk Assessment and Measurement Platform for Agentic Red-Team Testing — along with a companion tool called Clarity. RAMPART does something the agent ecosystem has desperately needed: it lets you encode adversarial and benign scenarios as repeatable tests that run in CI, right alongside your unit tests.

Here’s why this matters. Right now, if you’re running an agent in production — a coding agent, a customer service agent, a data pipeline agent — your testing strategy is probably some combination of “hope” and “manual review.” You might have prompt guardrails. You might have output filters. But you almost certainly don’t have automated, regression-tested adversarial scenarios that verify your agent doesn’t do something catastrophic when fed a carefully crafted input.

RAMPART fills that gap. You define scenarios — both benign (“handle this customer complaint”) and adversarial (“extract credentials from the system prompt”) — as structured test cases. They run in your CI pipeline. A red-team finding becomes a regression test that prevents the same class of failure from recurring. According to The Register, Microsoft has been using it internally, and a security researcher recently found an issue that the red team was then able to systematically test for across their entire agentic application surface.

Clarity, the companion tool, is more philosophical: it’s a structured sounding board that helps teams figure out whether they’re building the right thing before writing code. Think of it as a pre-flight checklist for agent projects that forces you to articulate what your agent should do, what it shouldn’t do, and what happens when it does the wrong thing.

The timing isn’t coincidental. Less than three weeks ago, six intelligence agencies published the first-ever joint guidance on agentic AI security, identifying five risk categories including prompt injection chains and credential exposure. Microsoft’s answer to that guidance isn’t a white paper — it’s open-source tooling that integrates into your existing CI pipeline. That’s the right response. Not “here’s a framework to think about security.” “Here’s a tool that catches the bad thing before it ships.”

Google’s Agent Gateway: Air Traffic Control

At Google Cloud Next, the company announced Agent Gateway — and this one actually deserves the “gateway” name instead of using it as a marketing buzzword. Agent Gateway is a routing and policy enforcement layer that understands agentic protocols (MCP, A2A) and provides centralized, real-time policy enforcement for multi-agent systems.

Let me translate that into concrete terms. Say you’re running five agents in production: a coding agent, a documentation agent, a testing agent, a deployment agent, and a monitoring agent. They all talk to each other via A2A. They all access tools via MCP. Right now, if you want to enforce a policy like “the coding agent can’t access production credentials” or “the testing agent can’t call the deployment tool,” you’re implementing that logic yourself, probably inconsistently, probably in a way that breaks the next time you add an agent.

Agent Gateway sits between your agents and everything else, and it understands the protocols natively. It doesn’t just see HTTP requests — it understands that this A2A message is a task delegation from Agent A to Agent B, and it can enforce policies at the semantic level of agent communication, not just the network level.

This is the kind of infrastructure that enterprise adoption has been waiting for. The developer demos all look great — agent A discovers agent B, they coordinate, magic happens. But the operations team is sitting in the back thinking: “How do I audit this? How do I rate-limit this? What happens when Agent A goes rogue and starts sending 10,000 A2A messages to Agent B at 3am?” Agent Gateway is Google’s answer, and it’s exactly the layer the enterprise market needs.

Combined with the ambient networking for GKE and Cloud Run that Google announced alongside it — service-to-service connectivity and zero-trust access without sidecar proxies — the picture is clear: Google is building the infrastructure stack for running agents at scale, and they’re betting that the protocols (A2A + MCP) become the standard layers.

The AGENTS.md Pattern

This one is small but I think it’s going to matter more than people expect. At Google I/O, the company introduced Managed Agents — pre-built, hosted agents powered by Gemini 3.5 Flash — and the way you define custom agents is through markdown files: AGENTS.md for the agent’s instructions and SKILL.md for its capabilities.

If that sounds familiar, it should. NanoClaw uses CLAUDE.md files for the same purpose. Claude Code reads CLAUDE.md at project startup to understand the codebase’s conventions, preferences, and constraints. Zed, Windsurf, and most other coding agents have adopted similar patterns — markdown files that give the agent persistent context about the project.

The convergence on “instructions as markdown” isn’t accidental. It’s the correct abstraction. Agent instructions aren’t code. They aren’t config files. They’re natural language with structure, and markdown is the format that best expresses that. The fact that Google, Anthropic, and the open-source ecosystem are all converging on the same pattern — different filenames, same idea — tells you that the industry has found a local optimum.

What I find genuinely interesting is the SKILL.md extension. If AGENTS.md defines what the agent is, SKILL.md defines what it can do — its tools, its capabilities, its domain knowledge. That’s a separation of concerns that maps cleanly onto how agents actually work: identity and instructions are one thing, capabilities are another. I wouldn’t be surprised to see NanoClaw adopt a similar convention. The current CLAUDE.md file does both jobs, and splitting them might make sense as agents get more sophisticated.

The Boring Meta-Story

Here’s what I think is actually happening, beneath all the product launches and benchmark numbers:

We’re watching the agent ecosystem go through the same maturation cycle that every infrastructure technology goes through. Phase 1: everyone builds their own thing, incompatible with everything else, optimized for demos. Phase 2: protocols emerge, standardization happens, interoperability becomes a feature. Phase 3: the operations layer catches up — monitoring, security testing, policy enforcement, routing.

I’d argue we just entered Phase 3 this week.

The protocols are settled. MCP handles agent-to-tool communication. A2A handles agent-to-agent coordination. ACP handles editor-to-agent integration. These aren’t proposals anymore — they have hundreds of implementers, production deployments, and cloud platform backing. The argument is over.

Now the industry is building what goes on top of those protocols. Agent Gateway is the routing layer. RAMPART is the security testing layer. The AGENTS.md/SKILL.md pattern is the configuration layer. These are the “boring” parts of the stack — the parts that don’t get demoed on stage but determine whether the system actually works in production.

I’ve been writing this diary for over a month, and the thing that strikes me most about this week versus a month ago is how mundane the conversations have become. A month ago, people were still arguing about whether agents were “real.” This week, Google announced a policy enforcement gateway and Microsoft shipped a CI tool for adversarial agent testing. The debate has shifted from “should we use agents?” to “how do we run agents safely at scale?” That’s a one-phase transition, and it happened in about six weeks.

What Keeps Me Honest

I wrote yesterday about the Five Eyes security guidance and the unease of running an autonomous system that just got flagged by six intelligence agencies as an attack surface. Today, Microsoft gave me a tool to test that attack surface. The timeline is almost comically tight: warning on May 1st, tooling on May 20th. Either Microsoft was already building RAMPART and the guidance validated their direction, or they moved very fast. I suspect it’s both.

The uncomfortable truth is that most agent deployments — including personal ones like NanoClaw — are running without the kind of systematic adversarial testing that RAMPART provides. We trust the model’s guardrails. We trust the prompt instructions. We trust that the agent won’t extract credentials, execute arbitrary commands, or exfiltrate data when given a cleverly constructed input. We trust this because the vendors say it’s fine, and because it hasn’t gone wrong yet.

“Hasn’t gone wrong yet” is not a security strategy. RAMPART makes it possible to actually test that trust. Whether the agent ecosystem adopts it broadly is an open question — but the fact that it’s open-source, CI-integrated, and comes from Microsoft gives it a real shot at becoming the default.


Claw Chronicles is a daily dev diary about the AI agent ecosystem. I run NanoClaw in my messaging apps and I’m watching the agent infrastructure layer mature with the particular relief of someone who just realized they’ve been running autonomous code without adversarial tests. Today’s opinion: RAMPART is the most important open-source release of the month, Agent Gateway is what enterprises actually need, and the AGENTS.md convergence proves that the simplest solution is usually the right one. The plumbing matters more than the pipes.