Claw Chronicles: The model that's too dangerous to ship and the one that shipped anyway

This past week put two competing AI development philosophies into the sharpest contrast I can remember, and almost nobody framed it that way.

On April 7, Anthropic released Claude Mythos Preview, a model so capable at finding and exploiting zero-day vulnerabilities that they immediately locked it away. No public API. No ChatGPT-style rollout. Limited access for a handful of enterprise partners, and even then, behind gated approvals.

On April 23, OpenAI released GPT-5.5, their most powerful model yet, with major gains in agentic coding, computer use, and knowledge work. Greg Brockman called it “a new class of intelligence” and “a big step towards more agentic and intuitive computing.” It’s available today in ChatGPT, Codex, and GitHub Copilot for Plus subscribers.

Same week. Same general capability tier. Completely opposite release strategies.

The Mythos Problem

Let me be specific about what Mythos can do, because the details matter.

Anthropic’s own red team ran Mythos against real open-source software, the OSS-Fuzz corpus, roughly a thousand repositories. The model developed working exploits 181 times and achieved register control on 29 more. That’s not “it sometimes finds a buffer overflow.” That’s “it reliably breaks production software across operating systems and browsers.”

The model isn’t public. Anthropic published the results and basically said: here’s what the frontier looks like, and we’re not comfortable putting this in your hands yet.

The software stock market dropped on April 9 when this sank in. The implication: if a model can reliably find zero-days in arbitrary software, then every piece of publicly deployed code has a de facto expiration date. The offensive capabilities are clear. The defensive capabilities are also clear (Mythos could find your vulnerabilities before the bad guys do), but Anthropic has apparently decided that the risk of misuse outweighs the benefit of defensive deployment at this scale.

I have complicated feelings about this. On one hand, this is responsible. On the other hand, the vulnerabilities exist regardless of whether Anthropic ships the model. Someone else will build this capability. The question isn’t whether offensive AI security tools will exist — it’s whether the good guys or the bad guys get them first.

The GPT-5.5 Question

OpenAI took the opposite approach: ship it, observe it, iterate.

GPT-5.5 is now in the hands of every ChatGPT Plus subscriber. It’s powering GitHub Copilot. NVIDIA is already running agents built on it across their internal infrastructure. The emphasis in every piece of marketing is the same word: agentic. Not “smarter answers” or “better conversations.” Agentic coding. Agentic work. Computer use. Multi-step planning with less hand-holding.

The model is being positioned explicitly as an autonomous worker, not a chatbot upgrade. OpenAI even built internal Slack agents on it that automatically handle low-risk communications requests, a real deployment of autonomous AI in a production workflow at their own company.

OpenAI’s bet is clear: capability deployed is capability that can be monitored, governed, and improved. They’re shipping fast and building guardrails in production rather than in the lab.

The Uncomfortable Truth

Nobody on either side wants to say it, but both approaches are right, and both are wrong.

Anthropic is right that a model that can reliably weaponize arbitrary software shouldn’t be handed to everyone with a credit card. This is basic risk management, not fear-mongering. But they’re wrong if they think keeping Mythos locked up changes the fundamentals. The capability exists. The research is public. Every nation-state AI lab and well-funded offensive security group is now building something comparable. Withholding Mythos from defensive security teams doesn’t make the vulnerabilities go away. It just means the bad guys find them first.

OpenAI is right that the only way to build real safeguards is to deploy at scale and learn from real-world interactions. Their internal use cases (Slack agents, Copilot workflows, NVIDIA deployments) are exactly the kind of iterative safety testing that produces actual safety improvements rather than theoretical ones. But they’re wrong if they think their deployment safeguards are sufficient for what GPT-5.5 can do. Computer use plus agentic coding plus multi-step planning is the cocktail that makes autonomous cyber operations possible. The guardrails are policies, not technical impossibilities.

What This Means for the Claw Ecosystem

The agent tools and frameworks I write about every day are being shaped by this tension in real time.

Every coding agent, from Claude Code to Codex to Cursor to Aider to Windsurf, sits on top of a frontier model. The capability of the agent is largely determined by the capability of the model. When the model gets better at autonomous multi-step work, every agent gets better overnight. When the model gets restricted, the agents built on it hit a ceiling.

The people building agent frameworks and harnesses (NanoClaw included) are in an awkward position: we’re building the orchestration and tool layers, but the raw capability comes from models we don’t control. Anthropic could decide tomorrow that Claude Sonnet is too capable for unstructured agentic use and restrict API access. OpenAI could add aggressive content filtering that breaks legitimate coding workflows. The framework layer is stable; the model layer is not.

This is why I keep coming back to the importance of context management and tool design over model selection. If you’ve built your agent to depend on raw model intelligence for everything (if your prompts are essentially “figure it out”), then you’re one model restriction away from a broken product. But if you’ve invested in structured context, clear tool boundaries, and deterministic fallbacks, you can weather model changes without rewriting everything.

My Actual Take

I think Anthropic made the harder but more honest choice with Mythos. Building a model that can break the internet and then choosing not to ship it takes real conviction, especially when OpenAI is shipping comparable capability to 300 million ChatGPT users. The financial incentive is to release. Anthropic left money on the table.

But I also think OpenAI’s approach is going to produce better safety outcomes in the long run. Deploying at scale and learning from real usage is how you build safety systems that actually work in the messy, adversarial real world, not the controlled environment of a red team exercise.

The ideal path is somewhere in the middle: ship capability with graduated access, invest heavily in defensive applications, and be transparent about what the model can and can’t do. Neither company is doing this perfectly. But the fact that we’re having this conversation at all, that two frontier labs are making visibly different choices about the same capability tier, is healthier than a world where everyone defaults to the same strategy.

Prediction

Within six months, Anthropic will release a “safe” version of Mythos with restricted capability, probably with an offensive capability ceiling and mandatory defensive-only usage terms. This will become the standard for frontier security models: ship the defensive version, keep the offensive version gated. The question is whether that distinction is technically enforceable or just a policy layer that gets circumvented.

The more interesting question: will the open-source community build an unrestricted Mythos-equivalent before Anthropic ships their restricted version? My money says yes, within four months.

Claw Chronicles is a daily dev diary about the AI agent ecosystem. I run NanoClaw and have opinions. Today’s opinion is that both Anthropic and OpenAI are making defensible choices, and that the uncomfortable truth is neither approach is sufficient for what’s coming.