Claw chronicles: OpenAI's sandbox play and the week everyone copied NanoClaw's homework

Four days ago, OpenAI shipped a major update to their Agents SDK. Two days after that, TechCrunch reported that Emergent, a Bengaluru-based startup valued at $300 million, launched Wingman, an autonomous AI agent that lives inside WhatsApp, Telegram, and iMessage.

I don’t think either of these events is the story on its own. The story is what they signal together: the messaging-first agent model that projects like NanoClaw, OpenClaw, and Hermes have been building toward for the past year is no longer an experiment. It’s a category. And the big players are moving in.

The Agents SDK Gets Real

I’ll start with OpenAI’s announcement, because it’s the more technically interesting one.

The updated Agents SDK now includes native sandbox execution. Not “here’s a Docker container, good luck.” Not “integrate with E2B yourself.” Native. Built-in. The SDK gives agents a controlled workspace where they can read files, write files, install dependencies, run code, and use tools, all within a sandbox that OpenAI manages the orchestration for.

They’re also shipping a “Manifest” abstraction for describing an agent’s workspace. You can mount local files, define output directories, and pull in data from S3, GCS, Azure Blob, and Cloudflare R2. The same manifest works across sandbox providers: Blaxel, Cloudflare Workers, Daytona, E2B, Modal, Runloop, and Vercel are all supported out of the box.

This sounds mundane. It’s not. What OpenAI is building is a portable agent environment layer. The same agent workspace definition works locally, on Cloudflare, on Modal, on E2B. You write your agent once, describe its environment once, and run it anywhere. That’s the kind of boring infrastructure that actually matters, because it removes the “which sandbox provider do I use?” decision from the critical path of building an agent.

The durable execution stuff is even more interesting. The SDK supports snapshotting and rehydration. If your sandbox container dies, the agent’s state is externalized, and it can be restored in a fresh container from the last checkpoint. That’s not just a reliability feature. That’s an acknowledgment that long-running agents will fail, and the system should handle it gracefully.

OpenAI explicitly separates the harness from the compute layer. Credentials stay out of the sandbox. Model-generated code runs in isolation. The reasoning is security (“agent systems should be designed assuming prompt-injection and exfiltration attempts”), but the effect is architectural cleanliness. The part that thinks is separate from the part that acts. That’s how NanoClaw has been structured from day one, and seeing OpenAI arrive at the same conclusion independently is validating.

Wingman and the $300M Vote of Confidence

Now let me talk about Emergent, because this is the one that made me sit up.

Emergent started as a vibe-coding platform, think Cursor for non-developers. Eight million builders have used it. 1.5 million monthly active users. They raised $70 million in January at a $300 million valuation, backed by SoftBank, Khosla, and Lightspeed. These are not small numbers.

With Wingman, they’re pivoting from “help people build software” to “help people operate autonomously through software.” The agent lives in messaging apps: WhatsApp, Telegram, iMessage. It runs tasks in the background across email, calendars, CRMs, and workplace tools. It handles routine stuff on its own and asks for approval on consequential decisions. They call these “trust boundaries.”

That’s roughly the NanoClaw architecture. Container runtime, messaging interface, tool access with human approval gates. Emergent just built a VC-funded, polished-product version of it.

When a startup with that kind of backing independently arrives at the same design, messaging-first, trust boundaries, background execution, it means the market is speaking. People want their AI agents in the apps they already use, not in some new dashboard they’ll forget to open.

The Emergent CEO nailed it: “A lot of real work already happens through chat, voice, and email — asking for something, following up, sharing context, making a decision. Increasingly, they’ll be the main ways we work with agents too.”

Yes. That’s exactly it. The insight the claw project has been operating on is that messaging apps are the universal interface. Everyone has WhatsApp or Telegram or iMessage open. Nobody wants to download a new app to talk to their AI assistant. The agent should come to you, not the other way around.

The Convergence Nobody Planned

OpenAI, the company with the biggest model, is building infrastructure for agents that run in sandboxes with filesystem access, tool use, and durable execution. Emergent, a startup with 8 million users, is building a consumer-facing agent that lives in messaging apps. The New York Times ran a piece about “code overload,” companies going from 25,000 to 250,000 lines of code per month thanks to AI tools, creating downstream bottlenecks in review, testing, and maintenance. Stanford’s AI Index showed agents jumping from 12% to 66% task success on OSWorld, within 6 percentage points of human performance.

And on Reddit, the r/AI_Agents community is having a debate titled “state of AI agent coders April 2026: agents vs skills vs workflows,” arguing about whether the right abstraction is autonomous agents, predefined skills, or scripted workflows.

All of this is happening in the same week. And what it tells me is that we’re in the messy middle of a transition. The models are good enough that people are building real things with them. The infrastructure is starting to mature. The use cases are expanding beyond “write code” into “operate software.” But the right patterns haven’t settled yet.

The “agents vs skills vs workflows” debate is the most honest signal of where we are. It’s not a technical disagreement. It’s a conceptual one. Do you want an AI that thinks for itself (agent), an AI that follows a recipe (skill), or an AI that executes a predetermined sequence (workflow)? The answer, of course, is “it depends,” but the fact that the community is still arguing about it means nobody has found the abstraction that makes the others obsolete.

My read: agents for novel tasks, skills for repeatable tasks, workflows for tasks that need to be deterministic. The claw ecosystem already does this. NanoClaw acts as an agent by default but can run scheduled tasks (workflows) and has defined skills for common operations. The boundaries are porous and that’s fine. Rigid taxonomies in a space moving this fast are a trap.

The Code Overload Elephant

Then there’s the NYT piece, the shadow hanging over all this excitement.

Companies are producing 10x more code with AI tools. That code needs to be reviewed, tested, maintained, and eventually rewritten. The NYT article describes organizations where sales and marketing teams are being forced to accelerate to keep up with the engineering velocity AI has unlocked, creating “a lot of stress.”

This is the uncomfortable flip side. Yes, AI makes it easier to write code. No, it doesn’t make it easier to maintain code. The companies seeing 10x output increases are also seeing 10x more surface area for bugs, security vulnerabilities, and technical debt.

If Emergent’s Wingman can genuinely handle routine tasks autonomously, if it can triage emails, schedule meetings, update CRMs without human intervention, then the code overload problem becomes the code trust problem. When an AI writes the code and another AI operates the software, who’s responsible when something breaks?

I don’t have a clean answer for this. I think it’s going to be one of the defining tensions of the next 18 months.

One Prediction

By Q3 2026, at least one of the major coding agent companies will launch a “code health” product, an AI agent specifically designed to review, test, and maintain the code that other AI agents generate. The AI that writes the code and the AI that reviews the code will be from the same company, and nobody will think that’s weird.

NanoClaw is going to need its own version of this too. If my agent is scheduling tasks, writing files, and updating my wiki, I need a second agent, or a verification layer, making sure the first one didn’t make a mess. Trust boundaries are good for preventing catastrophic actions. They don’t help with the slow accumulation of small errors.

The models are getting better at verifying their own work, as I noted yesterday about Opus 4.7. But “better” isn’t “perfect,” and at the scale these tools are being deployed, “not perfect” compounds fast.

Claw Chronicles is a daily dev diary about the AI agent space. I run NanoClaw and have opinions. This week validated a lot of them, which is either reassuring or a sign I’m not thinking weirdly enough.