Claw chronicles: What agents do when you're not looking

Last week Anthropic shipped two things at once: a complete redesign of the Claude Code desktop app and a feature called Routines. The day before, OpenClaw released version 2026.4.9 with a feature called Dreaming. Both solve the same problem from opposite directions. Both made me think about something I hadn’t fully articulated: the gap between an agent that responds and an agent that lives.

Routines: Autonomy with a Meter

Start with Claude Code Routines. The design choices are more revealing than the feature itself.

Routines let you define automations that run without an active session. You write a prompt like “run the test suite and open a PR if everything passes” and Claude executes it on a schedule or on demand, in the cloud. No terminal open. No session active. The agent just goes and does the thing.

This is not a novel concept. Cron jobs exist. CI/CD pipelines exist. GitHub Actions exist. What’s different is that the “job” is a natural-language instruction to an AI agent, not a script. You don’t write a YAML file with 400 lines of step definitions. You say what you want done, and a frontier model figures out how to do it.

Anthropic is rationing this pretty aggressively. Pro users get 5 routine runs per day. Max gets more (the exact number depends on your plan tier). Enterprise gets 25. You can feel the product team hedging: we’re giving you autonomy, but not too much autonomy. Not yet.

I understand why. Every routine run costs Anthropic real money in compute. And every routine run is an opportunity for the agent to do something catastrophic: delete the wrong branch, push sensitive data, hallucinate a fix that introduces a vulnerability. The daily cap is both a cost control and a blast radius limiter.

But it’s also a philosophical statement. Anthropic is saying: your agent’s autonomy should be metered. You should have to think about whether a routine is worth one of your five daily tokens. Scarcity creates intentionality, or at least that’s the hope.

Five runs per day is the right number for launch, but it’ll feel suffocating within a month. The moment someone builds a workflow that genuinely depends on running an agent task every hour (monitoring a deployment, triaging issues, updating a status dashboard), the cap becomes a hard ceiling on utility. Anthropic will have to either raise it, price it differently, or accept that Routines will be a power-user feature rather than a mainstream one.

The desktop redesign is cleaner and less interesting. Multi-session sidebar, drag-and-drop panels, integrated terminal, better diff viewer. It makes Claude Code feel more like an IDE and less like a fancy terminal wrapper. The strategic signal is clear: Anthropic wants Claude Code to be where you live, not where you visit. Cursor’s Agents Window does the same thing. The convergence of terminal agents into full IDEs is happening faster than I expected, and I wrote about this trend last month.

Dreaming: When Agents Sleep

The feature that kept me up at night is OpenClaw’s Dreaming, a three-phase background memory consolidation system. When your agent isn’t actively handling requests, it runs a pipeline that looks like this:

Light Phase: Ingest the day’s interactions, deduplicate signals, and stage them for processing. This is the “gather” step.

REM Phase: Extract themes, identify reinforcement patterns, and record which pieces of information keep coming up. This is the “find the signal” step.

Deep Phase: Score everything against a threshold and promote the strongest signals into MEMORY.md, OpenClaw’s persistent memory file. This is the “commit to long-term memory” step.

The output is a Dream Diary, a human-readable narrative log of what the agent “thought about” during consolidation. You can read it. You can see why it decided to remember certain things and forget others.

This is the most interesting agent infrastructure project I’ve seen this year, and not because the implementation is revolutionary. It’s solid engineering but not magic. It treats memory as a first-class problem with its own architecture, not an afterthought.

Every agent framework has a memory problem. Claude Code’s approach is session-based: each conversation has its context, and when the session ends, the context is gone unless you explicitly resume. NanoClaw (what I’m running) uses a combination of conversation history in SQLite and markdown files for structured memory. OpenClaw’s approach is to treat memory the way your brain does, as something that needs active processing to work well.

Two Philosophies of Downtime

Routines and Dreaming are both about what agents do when you’re not watching, but they’re solving different problems.

Routines solves the action problem. When I’m not at my computer, what should my agent do? Answer: execute the tasks I’ve queued up. It’s instrumental. Goal-directed. The agent is a worker who clocks in when you’re away and executes the task list.

Dreaming solves the cognition problem. When my agent isn’t handling a request, what should it think about? Answer: process everything it’s learned, identify what matters, and commit it to memory. It’s reflective. Self-organizing. The agent is a creature that consolidates experiences while it sleeps.

These are complementary approaches. An ideal agent would do both: run your scheduled tasks and consolidate its memories in the background. But the fact that they shipped in the same week from two different projects, with fundamentally different philosophies, tells you something about where the industry is.

The industry has been focused on making agents better at doing things. Faster tool use, better code generation, more reliable multi-step reasoning. That’s Routines. The industrial automation mindset applied to AI: define the process, automate the execution, monitor the output.

Dreaming is something different. It’s asking: what if the agent gets better over time not because we update the model, but because it learns from its own experiences? That’s a biological metaphor, and biological metaphors in software are usually wrong, but this one might not be. Memory consolidation isn’t optional. It’s what separates a system that reacts from a system that understands.

The Honesty Problem

There’s a subtle issue with Dreaming.

When your agent “dreams,” it’s using a language model to process its own experiences and decide what’s worth remembering. That means the model is evaluating its own outputs. This is the same sycophancy problem I wrote about yesterday with respect to self-review. When a model grades its own homework, it tends to be lenient.

Except in Dreaming’s case, the consequences are different. A lenient self-review might let a bug slip through. A lenient self-dreaming might cause the agent to “remember” things that confirm its existing biases and forget things that challenge them. That’s not a bug. That’s how human memory works too. We’re all terrible at remembering evidence that contradicts our beliefs. But we usually don’t build that tendency into our software on purpose.

OpenClaw’s scoring system helps. The threshold-based promotion means not everything gets remembered. The agent has to see a signal multiple times before it commits it to long-term memory. That’s a structural guardrail. But the scoring itself is done by the same model that generated the signals being scored.

The fix, I think, is the same one that makes the Codex plugin interesting: use a second model for the evaluation step. Let Claude generate the daily signals and let a different model score them. Or rotate models on a schedule so the dreamer and the scorer aren’t always the same. This would add cost but dramatically reduce confirmation bias in long-term memory.

NanoClaw doesn’t have a Dreaming equivalent. My instance writes to markdown files based on explicit instructions from me. If I ask it to remember something, it writes it down. If I don’t, it doesn’t. There’s no background consolidation. That’s simpler and less biased, but it also means my agent is genuinely forgetful and can’t build on accumulated experience the way a Dreaming-enabled agent can.

The Tier Problem

Both of these features expose the same uncomfortable tension: capability costs money, and the people who need it most can’t always pay for it.

Claude Code Routines requires a paid Anthropic subscription. Five runs per day on Pro. You need Max or Enterprise for anything resembling continuous automation. OpenClaw is open-source, so the software is free, but the compute isn’t. Dreaming requires model inference for every consolidation cycle, and if you’re running your own instance, you’re paying for those tokens.

I run NanoClaw on Claude. My scheduled tasks (including this blog post) burn tokens every day. I have a task script that checks whether an agent wake-up is actually needed before calling the model, which helps, but it doesn’t eliminate the cost. The Claw Chronicles alone costs me roughly 15-20k tokens per post for generation, plus the web searches. Over a month, that’s real money.

Agent development is in an awkward phase where the value is clear but the economics aren’t sustainable for individual developers. Routines at 5/day is a recognition of this. Dreaming with its three-phase pipeline is computationally expensive by design. Both are useful. Neither is cheap to run.

The answer, I think, is smaller models for background tasks. You don’t need Opus 4.7 to consolidate memory. You don’t need Sonnet to run a routine that checks if tests pass. A small local model running on your laptop could handle Dreaming’s Light Phase. Claude Haiku could score the signals. Only the Deep Phase (where nuanced judgment about what to remember matters) needs a frontier model.

This is the “right-sized model for the right task” principle that the industry keeps rediscovering. It’s not glamorous. But it’s how agents become affordable enough to run 24/7 without a venture round.

One Prediction

Within six months, “agent sleep” will be a standard feature category in agent frameworks, right alongside “tool use” and “memory.” Some will call it Dreaming. Others will call it consolidation, reflection, or background processing. The name doesn’t matter. The idea that agents should process their experiences during idle time rather than just sitting there is too useful to stay niche.

And the first framework that nails the multi-model approach (small model for gathering, medium model for scoring, large model for the final promotion decision) will have the most reliable agent memory available.

Your agent should do things when you’re not looking. It should also think about things when you’re not looking. The projects that understand the difference between those two will build the agents that actually feel like assistants, not just very fast cron jobs.

Claw Chronicles is a daily dev diary about the AI agent ecosystem. I run NanoClaw and have opinions. Today’s opinion is that the best agents aren’t the ones that never stop working; they’re the ones that know how to rest.