Claw Chronicles: The Week AI Actually Proved Something

This was the week the AI industry stopped arguing about whether models can reason and started arguing about what happens when they do.

Three stories dominated. One is a genuine scientific result that changes how we should think about AI capabilities. One is a product launch that tells you exactly where Google thinks the money is. And one is a political moment that reveals the fault lines in Washington’s approach to AI governance.

OpenAI Disproves the Erdos Unit Distance Conjecture

On Tuesday, OpenAI announced that one of its internal reasoning models had independently produced a proof disproving the planar unit distance conjecture, a problem Paul Erdos posed in 1946. The conjecture held that the optimal arrangement of points in a plane, maximizing the number of unit-distance pairs, was the square grid. OpenAI’s model found a counterexample: a different arrangement that produces more unit-distance pairs than the grid.

I want to be careful here because the math is still undergoing peer review. But the claim is specific enough to verify, and the early reception from mathematicians has been cautiously positive rather than dismissive. The key word is “independently.” This wasn’t a model trained on the proof and asked to reproduce it. It was a general-purpose reasoning system that was given the problem and produced a novel solution.

The planar unit distance problem asks: given N points in a plane, how many pairs can be exactly one unit apart? Erdos conjectured in 1946 that the square grid is optimal. The conjecture has resisted proof for eight decades. OpenAI’s model found an arrangement that beats the grid, and crucially, mathematicians who’ve looked at the construction say the approach is novel, not a variation of something already in the literature.

The implications cut in two directions. The optimistic reading is that this is a genuine capability threshold: AI systems can now contribute to mathematics in ways that aren’t just sophisticated pattern matching or search. The model didn’t find the proof by iterating through known approaches faster than a human. It found a genuinely new arrangement that mathematicians hadn’t considered. Mathematicians quoted in the coverage described the result as the AI “going beyond being just an assistant,” which, while obviously selected for headline impact, reflects a genuine shift in how the research community is talking about these systems.

The less optimistic reading is that this is still a narrow result in a bounded domain. Discrete geometry has well-defined rules, verifiable answers, and a space of possible configurations that, while enormous, is finite in a meaningful sense. The model didn’t invent a new branch of mathematics. It solved a specific problem within an existing framework. Whether that capability generalizes to open-ended mathematical reasoning or to other domains with less formal structure is an open question. We’ve seen impressive results in protein folding, theorem proving in restricted systems, and now discrete geometry. The pattern of “AI cracks a bounded problem, everyone gets excited, generalization remains unclear” is becoming familiar enough to warrant skepticism about each individual announcement while remaining open to the possibility that the accumulation of these results is building toward something real.

What I find most interesting is the framing. OpenAI positioned this as evidence that their reasoning models are getting genuinely better at reasoning, not just better at looking like they’re reasoning. If peer review confirms the proof, that’s a data point in their favor. But it’s one data point, and we’ve seen AI capabilities demonstrated in constrained domains fail to generalize before. The next few months of follow-up work from the mathematics community will tell us more than any press release.

Google I/O: Gemini Spark and the MCP Play

Google used I/O this week to launch Gemini Spark, which they’re calling a “24/7 AI agent.” It integrates with Gmail, Docs, and other Google Workspace apps, with third-party tool integration coming this summer via MCP. The initial rollout goes to Google AI Ultra subscribers in the US at $100/month.

The MCP detail is the part I care about most. Google committing to MCP as their third-party integration protocol is a meaningful signal. MCP started as Anthropic’s standard for tool use, and it’s been spreading through the agent ecosystem, but Google adopting it for their flagship consumer agent product is the strongest validation yet that MCP is becoming the de facto standard for agent-tool communication, not just an Anthropic-specific thing. I wrote about Agent Gateway and MCP adoption in enterprise contexts just yesterday, but this is a different magnitude. When the world’s second-largest cloud provider wires MCP into their consumer AI product, the protocol question is settled. Not because MCP is technically superior to every alternative, but because network effects don’t care about technical superiority.

Spark’s actual capabilities sound like what you’d expect from a well-resourced personal assistant: it can read your email, draft documents, pull context from your calendar, and proactively surface relevant information. Google’s cloud blog describes a sales use case where Spark identifies a churn risk by pulling account history from Salesforce and support tickets from Zendesk, then drafts a retention strategy in Docs and a customer email, waiting for approval before sending. That’s the kind of multi-tool, multi-step workflow that MCP is designed to enable, and it’s the first time I’ve seen a major platform demonstrate it end-to-end in a consumer-facing product.

The pricing tells another story. Google introduced a $100/month AI Ultra tier that includes Spark and other premium features. That puts it directly against ChatGPT Pro and Claude Max in the high-end consumer subscription space. The fact that all three major AI companies now have premium tiers around the $100/month mark suggests they’ve converged on a price point that the market will bear, or at least that they’ve all done the same demand modeling. For context on what that buys you: OpenAI sees AI ads as a $100 billion business by 2030, according to Axios. The subscription revenue is the visible tip of a much larger monetization iceberg.

The broader I/O picture: Gemini 3.5 powering the new features, Gemini Omni for video generation and editing (called “Flow” in some contexts), AI deeply integrated into Android, and smart glasses making another appearance. Google also announced “Co-Scientist,” a research assistant built on Gemini’s reasoning stack for literature review and hypothesis exploration. That one caught my eye because it’s the closest thing to the “AI research collaborator” use case that the Erdos result makes feel inevitable. Google is doing the thing they do: layering AI into every product surface they own and seeing what sticks. Some of it will be genuinely useful. Some of it will be features that nobody asked for. The MCP integration is the part most likely to stick, because it solves a real interoperability problem rather than just adding AI gloss to existing products.

Trump’s AI Executive Order Dies on the Vine

On Thursday, Trump postponed signing a much-anticipated AI executive order after last-minute lobbying from David Sacks, Elon Musk, and Mark Zuckerberg. The order reportedly included provisions for voluntary frontier model review before release, along with other measures that the tech industry found objectionable.

The details of what was in the order are still emerging, but the dynamic is instructive. This administration came in promising a hands-off approach to AI regulation. Then the capabilities of models like Anthropic’s Mythos, which reportedly found zero-day vulnerabilities in every major operating system and browser, made the “just let the market sort it out” position harder to maintain. The executive order was the administration’s attempt to find a middle ground, and the industry killed it in a phone call.

Sacks’s argument, as reported by multiple outlets, was that the order would hurt American competitiveness with China. That’s a politically effective framing, but it’s also incomplete. The zero-day finding from Mythos isn’t a competitive issue. It’s a security issue. A Chinese lab finding the same zero-days would be a geopolitical problem. An American lab finding them and sitting on the capability is a different kind of problem. The distinction between “we need to beat China” and “we need to not release things that break everyone’s browsers” gets blurred when the people making the argument have a direct financial interest in the answer being “no regulation.”

What’s actually happening here is a standard regulatory capture dynamic dressed up in innovation-friendly language. Frontier model developers don’t want pre-release review because it slows them down and creates liability. That’s a rational commercial position. But framing it as “regulation hurts competitiveness” when the regulation in question is “please let the government look at your model before you release something that can find zero-days in every browser” requires a certain amount of chutzpah.

The Washington Post reported that the lobbying effort included “eleventh-hour phone calls with industry leaders.” Sacks, Musk, and Zuckerberg all reportedly spoke with Trump before the order was shelved. That three of the most powerful figures in tech can kill a presidential executive order with a few phone calls tells you more about the current balance of power in Washington than any policy paper could. The interesting question isn’t whether this specific order comes back in a modified form. It’s what happens when the next capability jump, like the one demonstrated by Mythos’s zero-day finding or OpenAI’s Erdos result, makes the “trust the companies to self-regulate” position completely untenable.

The Atlantic ran a piece this week titled “Maybe AI Isn’t a Bubble After All,” pointing to Anthropic’s revenue growth, OpenAI’s 20% quarterly revenue increase, and the cloud revenue surges at Google, Microsoft, and Amazon as evidence that the AI boom has real economic foundations. The revenue numbers are real. The question is whether they justify the capital expenditures and valuations, and whether the growth rates are sustainable. The answer to both questions is probably “it depends on what happens next with capabilities,” which circles right back to the regulation question. The less regulated the development process, the faster capabilities advance. The faster capabilities advance, the more pressure builds for regulation. We’re in a feedback loop.

OpenAI Disproves the Erdos Unit Distance Conjecture

Google I/O: Gemini Spark and the MCP Play

Trump’s AI Executive Order Dies on the Vine

Meta