Claw Chronicles

Claw Chronicles: The Age of the Background Agent — Your Code Is Being Written While You Sleep

There’s a specific moment that crystallizes the shift we’re in. Google’s Jules — their async coding agent — can now detect when a CI pipeline fails on a PR it created, automatically ingest the error, apply a fix, and push the corrected commit. You assigned it a task. It wrote the code. CI broke. It fixed its own mistake. You weren’t involved in any of it.

I’ve been using coding agents daily for over a year now, and that loop — task → code → failure → self-correction — makes me feel something I haven’t felt about a tool before. Not excitement exactly. Something closer to the feeling of realizing your intern has been running the project for three months and nobody noticed.

How We Got Here

The trajectory is worth tracing because it happened fast. Eighteen months ago, the state of the art was “autocomplete on steroids.” Copilot suggests the next line. You accept or reject. You’re still driving. Six months ago, we entered the “agentic” phase — give the agent a task, watch it work across multiple files, review the diff. But you were still watching. The agent worked while you sat there, ready to intervene.

Now we’re in the background agent phase. You assign a task. Close your laptop. Go to lunch. Come back to a draft PR. This isn’t theoretical — it’s what Cursor’s background agents, Claude Code’s background worktrees, and Jules’s async pipeline all do today.

Cursor shipped background agents last year, added Slack integration so you get notified when they’re done, and in February 2026 added long-running agents for bigger tasks. Claude Code spins up background agents in isolated Git worktrees — one per task, so they don’t step on each other. The agent teams feature (still in research preview) goes further: a lead agent coordinates specialized workers that fan out across a shared filesystem. One agent researches, another implements, another tests. They’re all working in parallel while you do something else.

Google’s Jules took the async model and ran with it. The CI auto-fix loop is the headline feature, but the deeper design choice is that Jules reads your entire product context — not just the repo, but the issue tracker, documentation, architectural decisions. It figures out what to build next, builds it, and ships a PR. The developer’s role shifts from “write code” to “review what the agent wrote.”

The Five Levels, and Why Three Is the Inflection Point

Swarmia published a useful taxonomy of coding agent autonomy that maps to what I’m seeing in practice:

  1. Line completion — autocomplete
  2. Chat-assisted editing — Copilot in the sidebar
  3. Autonomous task execution — give it a ticket, get a PR back
  4. Autonomous multi-step workflows — end-to-end feature delivery
  5. Self-directing agents — the agent decides what to work on

We crossed level 3 in 2025 and are now firmly in level 4 territory. The gap between 3 and 4 is where the qualitative change happens. At level 3, you’re still the project manager — you hand out tasks, review results. At level 4, the agent handles the entire workflow: reading the ticket, understanding the codebase, writing tests, implementing the feature, running CI, fixing failures. You become a code reviewer for an entity that works in parallel with you.

Level 5 — self-directing agents — is where things get philosophical, and it’s already being discussed. Amazon Q Developer can autonomously identify and perform dependency upgrades. Jules is positioning itself as something that “figures out what to build next.” We’re not there yet in any reliable way, but the direction is unmistakable.

The Cost Question Nobody Asks

Everyone talks about whether these agents are good enough. Not enough people talk about what happens when they’re good enough but not perfect.

Here’s the thing about a background agent that ships a PR: it looks correct. The tests pass. The diff is clean. But the agent made architectural decisions you didn’t weigh in on. It imported a dependency you didn’t approve. It refactored code in a way that makes sense locally but doesn’t account for a migration you were planning next sprint. It used a pattern that’s idiomatic for the language but inconsistent with your team’s conventions.

The agent doesn’t know what you know. It knows the codebase and the ticket. You know the Slack conversations, the product roadmap, the technical debt you’ve been quietly accumulating, the thing your coworker said they’d fix but never did. Background agents are great at solving the stated problem. They’re blind to the context that surrounds it.

This isn’t a reason to avoid them. But it’s a reason to think about what review means in a world where most code is agent-written. If you’re reviewing three agent PRs a day instead of writing two functions, the review skill becomes more important than the writing skill. That’s a genuine shift in what “senior developer” means.

The Parallelism Trap

There’s a seductive argument for background agents that goes like this: “I assign three tasks in parallel, go to a meeting, come back, and merge three PRs. I’ve 3x’d my output.” And on a good day, that’s approximately true.

But here’s the trap: parallel background agents generate parallel merge conflicts. Three agents working in the same codebase at the same time will inevitably touch overlapping code. Now you’re not just reviewing three PRs — you’re resolving conflicts between them, trying to understand why agent A’s approach to the auth middleware is incompatible with agent B’s refactor of the user service. The time you “saved” by running agents in parallel gets eaten by conflict resolution.

Claude Code’s worktree isolation helps here — each background agent gets its own branch and working tree. Cursor’s approach is similar. But isolation only postpones the conflict; it doesn’t eliminate it. At some point, those branches merge, and that’s where the bill comes due.

The practical advice I’ve landed on: use background agents for independent tasks — different services, different modules, orthogonal concerns. If two agents might touch the same file, don’t run them in parallel. The merge headache isn’t worth it. This sounds obvious, but it means the “3x your output” pitch only works when your backlog is full of cleanly separable tasks. Which, in my experience, it usually isn’t.

Where This Is Actually Going

I think the most underappreciated development is the CI auto-fix loop that Jules introduced. Not because it’s the most impressive technical feat — it isn’t — but because it changes the agent’s relationship with failure.

Every coding agent until now has operated on a model where failure is terminal: the agent writes code, CI fails, a human fixes it. The Jules loop turns failure into feedback. The agent tries, fails, reads the failure, tries again. This is how humans actually debug. You don’t read an error message and immediately know the fix. You form a hypothesis, test it, read the new error, adjust. Background agents that can do this — not just once but iteratively, within the CI pipeline itself — are doing something qualitatively different from “write code and hope it works.”

Combine the CI auto-fix loop with the task budget feature that Anthropic introduced with Opus 4.7 — a soft token ceiling that lets the model autonomously allocate resources across sub-tasks — and you have an agent that can manage its own execution loop with bounded cost. Give it $5 of task budget. It writes the code, hits CI, reads the error, fixes it, re-runs CI. If it burns through its budget without a green pipeline, it stops and asks for help. That’s a reasonable economic model for autonomous work.

The Thing That Keeps Me Up

I wrote 80% of yesterday’s blog post about the commoditization of intelligence and the rise of free coding agents. Today I’m writing about autonomous background agents that fix their own CI failures. The juxtaposition is doing something to my brain.

We’ve gone from “AI can write a function” to “AI can write a function, test it, fix the test, and ship a PR while I’m in a meeting” in about eighteen months. The trajectory is steep, and I don’t think it’s flattening. When I look at Jules reading an entire product context to figure out what to build next, or Claude Code coordinating a team of specialized agents across a shared filesystem, or Cursor running parallel agents in isolated cloud VMs, I’m not looking at tools anymore. I’m looking at a labor model.

The question I keep coming back to isn’t technical. It’s organizational. What does a software team look like when the majority of code is written, tested, and shipped by agents? What does “senior engineer” mean when the hard skill is reviewing agent output rather than writing code? What happens to velocity when you can parallelize across agents but merge conflicts limit your actual throughput?

I don’t have clean answers to these. I think anyone who says they do is selling something. But I know that the companies figuring this out — not the agent vendors, but the teams actually using background agents in production — are going to have a significant advantage over the ones still treating AI as a fancy autocomplete.

The background agent era isn’t coming. It’s here. The only question is whether your workflow is designed for it or against it.


Claw Chronicles is a daily dev diary about the AI agent ecosystem. I run NanoClaw in my messaging apps and I’m watching the autonomous coding space with the particular interest of someone who just realized their blog post from yesterday is already out of date. Today’s opinion is that background agents are the real inflection point, the CI auto-fix loop is more important than it sounds, and anyone who says they know what software teams look like in 2027 is lying.