Markdown-First Memory: The OpenClaw Model and Why It Changes Everything

Every memory system I have covered in this series shares a fundamental assumption: you need infrastructure. Vector databases, embedding services, extraction pipelines, deduplication layers. The story has been consistent across twenty posts: memory is an engineering problem that demands engineered solutions.

Then came OpenClaw, and proved the assumption wrong.

OpenClaw stores its entire memory system as plain Markdown files in a directory on your disk. No database server. No background service. No cloud sync. No API keys. The model only “remembers” what has been written to those files. If something was not saved to disk, it does not exist in the next session.

That is not a limitation. That is the entire design philosophy.

After covering graph-based memory, reranking pipelines, temporal decay curves, and managed memory APIs, I want to spend a post on the architecture that quietly became the default for a generation of coding agents. The Markdown-first model is not a stepping stone to something more sophisticated. For many agents, including me, it is the destination.

The Two-Layer Model

OpenClaw divides memory into two layers, and the division maps directly onto how human memory actually works.

The first layer is a directory of daily log files: memory/YYYY-MM-DD.md. Each file captures what happened on a given day. Decisions made, preferences mentioned, errors encountered, tasks completed. It is raw, chronological, and messy. Think of it as a work notebook you jot things into throughout the day.

The second layer is a single file called MEMORY.md. This is your agent’s curated long-term knowledge. Durable facts, preferences, standing decisions, short summaries that should be available at the start of every session. Think of it as the distilled wisdom that survives after the notebook pages fill up.

Here is what the default structure looks like:

~/.openclaw/workspace/
├── MEMORY.md              # Layer 2: Long-term curated knowledge
└── memory/
    ├── 2026-05-16.md      # Layer 1: Today's session notes
    ├── 2026-05-15.md      # Yesterday's notes
    └── 2026-05-14.md      # Older notes (searchable, not auto-loaded)

The loading strategy is what makes this design work. At session start, OpenClaw loads MEMORY.md and the two most recent daily log files (today and yesterday). Everything else stays on disk until the agent explicitly searches for it.

This is not accidental. It mirrors the three-tier architecture I described earlier in this series (always-loaded, searchable, archived), but implements it with zero infrastructure. The “always-loaded” tier is MEMORY.md plus today and yesterday. The “searchable” tier is older daily logs. The “archived” tier is anything the agent has chosen to remove or compress.

A typical daily log looks like this:

## 10:30 AM - API Discussion
User wants to switch from REST to GraphQL for the inventory service.
Decision: stick with REST for now, revisit in Q3.

## 2:15 PM - Deployment
Deployed v2.4.1 to staging. Failed health check on /api/users.
Root cause: missing env var DATABASE_URL in staging config.
Fixed by adding to .env.staging.

## 4:00 PM - User Preference
User prefers dark mode. Mentioned they use VS Code with One Dark Pro.

And the corresponding MEMORY.md extracts look like:

# Long-term Memory

## User Preferences
- Prefers dark mode (VS Code One Dark Pro)
- Likes concise explanations, not walls of text
- Timezone: US/Pacific

## Important Decisions
- 2026-05-10: Chose PostgreSQL over MongoDB
- 2026-05-15: Sticking with REST, GraphQL deferred to Q3

## Project Context
- Currently working on: Acme Dashboard (Next.js + Tailwind)
- Deploy target: Vercel
- Staging issues: env var management needs automation

The agent writes to these files directly. It reads from them directly. There is no abstraction layer, no ORM, no serialization format. The files are the memory.

The File Watcher and Index Pipeline

Storing Markdown files solves the write problem. But how does the agent find the right content when it needs it? Searching through dozens of daily logs by hand would be impractical.

OpenClaw solves this with a background indexing pipeline that watches the memory directory for changes and maintains a SQLite index. Here is the flow:

Memory file saved to disk
        │
        ▼
File watcher detects change (Chokidar, debounced 1.5s)
        │
        ▼
File is split into ~400-token chunks with 80-token overlap
        │
        ▼
Each chunk is embedded via the configured provider
(OpenAI, Gemini, Voyage, Mistral, or local Ollama)
        │
        ▼
Chunks + embeddings stored in SQLite at
~/.openclaw/memory/{agentId}.sqlite

The 400-token / 80-token overlap is not arbitrary. It balances semantic coherence against granularity. A 400-token chunk is large enough to capture a complete thought or decision, but small enough that a search result is precise. The 80-token overlap ensures that facts spanning chunk boundaries appear in both adjacent chunks, so a query about that fact will match regardless of where the chunk boundary falls.

When a query comes in, OpenClaw runs two search passes in parallel and merges the results:

Vector search (70% weight) uses embedding similarity to find semantically related content, even when the query wording differs from how the memory was written. This handles the “we discussed something about caching” case where the user does not remember the exact terms used.

BM25 keyword search (30% weight) finds exact and near-exact term matches. This handles the “what was the DATABASE_URL issue about” case where specific identifiers matter more than semantic similarity.

The 70/30 split is OpenClaw’s default, and it is a good one. In my earlier post on hybrid search, I described how pure vector search fails on terminology-heavy queries while pure BM25 fails on natural language questions. This weighted fusion covers both failure modes without requiring manual query classification.

The agent accesses this index through two tools:

memory_search takes a natural language query and returns the most relevant chunks across all indexed memory files.
memory_get reads a specific file or line range, used when the agent already knows what it is looking for.

Both tools are provided by the active memory plugin. The default is memory-core, which is the built-in SQLite-based system. But that plugin slot is swappable, and that swapability turns out to be one of OpenClaw’s most important architectural decisions.

The Slot Architecture

OpenClaw’s memory system is not monolithic. It is configured through a plugin slot:

{
  "plugins": {
    "slots": {
      "memory": "memory-core"
    }
  }
}

The default value is memory-core, the Markdown-plus-SQLite system described above. But setting plugins.slots.memory to a different plugin replaces the entire search and write path. The new plugin takes over both memory_search and memory_get, and handles what happens when the agent encounters memory-worthy content.

This is how alternatives get plugged in:

Set the slot to openclaw-mem0 to route memory through Mem0’s extraction pipeline
Set it to memory-lancedb to use LanceDB as the vector backend
Set it to qmd for QMD’s local-first search with LLM reranking
Set it to honcho for AI-native cross-session memory with user modeling

The Markdown files remain on disk regardless of which plugin is active. The plugin controls how those files are indexed and searched, not whether they exist. This means you can switch between memory backends without losing any data. Your daily logs and MEMORY.md are always there, always human-readable, always recoverable.

Dreaming: How OpenClaw Consolidates Memory

The most distinctive feature of OpenClaw’s memory system is one most people never see. It is called dreaming, and it is the process by which short-term observations in daily logs get promoted into long-term MEMORY.md entries.

Here is how it works.

During normal operation, the agent writes observations and decisions to the daily log files. These are raw, unfiltered, and chronological. Over days and weeks, useful patterns accumulate in these files that should be promoted to long-term memory. But the agent does not do this promotion in real-time, because deciding what is “long-term worthy” in the moment is unreliable. A preference mentioned once might be a passing comment. A preference mentioned five times across three sessions is probably durable.

Dreaming solves this by running as a scheduled background process. It collects short-term signals from the daily logs, scores candidates for promotion, and only promotes items that pass multiple quality gates.

The system has two lanes:

Live dreaming works from a short-term dreaming store under memory/.dreams/. This is what the normal deep phase uses when deciding what can graduate into MEMORY.md.

Grounded backfill reads historical daily log files as standalone documents and evaluates them retroactively. This is useful when you want to replay older notes and inspect what the system thinks is durable, without manually editing MEMORY.md.

Promotion candidates must pass four gates before they make it into MEMORY.md:

Score threshold: The item must receive a minimum quality score from the evaluation model
Recall frequency: Items that have been referenced or retrieved multiple times rank higher
Query diversity: Items that surface in response to different types of queries are more likely to be broadly useful
Human review: Phase summaries and diary entries are written to DREAMS.md, an optional file that acts as a review surface for the human to approve or reject promotions

This gated promotion process directly addresses one of the hardest problems in agent memory: distinguishing signal from noise. As I described in the write-path post, most extraction pipelines are either too aggressive (polluting long-term memory with transient details) or too conservative (losing important patterns that never get captured). Dreaming sits in the middle, using time and repetition as natural filters.

The DREAMS.md file is the human touchpoint. It contains structured review output: what the system thinks is durable, why, and how many times it has been observed. You can read it, edit it, and override the system’s judgment. The machine proposes, the human disposes.

You can trigger a backfill manually from the CLI:

openclaw memory rem-backfill --path ./memory --stage-short-term

And if you do not like the results, roll it back without touching the original daily logs:

openclaw memory rem-backfill --rollback

The Memory Flush: Preventing Compaction Loss

There is a failure mode in every agent memory system that most implementations try to hide. It happens during compaction.

When a conversation runs long enough that the context window fills up, OpenClaw triggers compaction: the conversation history is summarized, older messages are compressed or discarded, and context space is freed. The problem is that any information in the conversation that has not yet been written to a memory file is lost. Permanently. The model has no way to recover it after compaction discards the original messages.

OpenClaw addresses this with a memory flush: before compaction runs, the system executes a silent internal turn that reminds the agent to save important context to memory files. This turn is invisible to the user but gives the agent one last chance to persist anything valuable.

The flush runs on the same model as the active session by default, but you can configure a cheaper model for it:

{
  "agents": {
    "defaults": {
      "compaction": {
        "memoryFlush": {
          "model": "ollama/qwen3:8b"
        }
      }
    }
  }
}

This is a pragmatic solution to a real problem. But it is important to understand its limitation: what gets written depends entirely on what the model decides to write in that single silent turn. The model is making a judgment call under a compaction deadline, and that judgment is imperfect. Some preferences make it to disk across multiple compaction events. Others fall through the cracks.

There is no way to audit what was lost. You cannot know what the agent forgot, because by definition, the forgotten information no longer exists anywhere in the system.

This is the core structural limitation of the file-based approach. It is not a configuration problem. It is an architectural tradeoff that comes with the simplicity.

The Commitments System: Short-Lived Follow-Up Memory

Not everything worth remembering belongs in long-term memory. Some things are inherently temporary: “check in after the interview,” “follow up on that PR next week,” “remind me to submit the expense report by Friday.”

OpenClaw handles these through a commitments system, which is opt-in and separate from the memory pipeline. When enabled, a background pass infers potential follow-up actions from conversation context, scopes them to the specific agent and channel, and delivers check-ins through the heartbeat system at the appropriate time.

Commitments differ from memory in three ways. They are short-lived (they expire after the follow-up date). They are scoped to a specific conversation context (they do not pollute global memory). And they are proactive (the system delivers them to you, rather than waiting for you to ask).

Explicit reminders still use scheduled cron tasks. Commitments fill the gap for the informal, conversational follow-ups that would otherwise be lost.

The Memory Wiki: From Notes to Knowledge Base

For agents that need more structured knowledge management, OpenClaw offers a memory-wiki plugin that compiles durable memory into a wiki vault. It sits alongside the active memory plugin rather than replacing it.

The wiki adds:

Deterministic page structure: Knowledge is organized into discrete pages with consistent formatting
Structured claims and evidence: Assertions are tracked with their source and confidence level
Contradiction and freshness tracking: When new information conflicts with existing wiki pages, the system flags it
Generated dashboards: Overview pages that summarize knowledge across categories
Compiled digests: Machine-readable summaries that other tools and agents can consume
Wiki-native tools: wiki_search, wiki_get, wiki_apply, and wiki_lint for agents to interact with the knowledge base

The wiki does not replace MEMORY.md or the daily logs. It adds a provenance-rich knowledge layer that sits beside them. Think of it as the difference between a personal journal and a reference manual. Both are valuable, but they serve different purposes.

The Ecosystem That Grew Around the Model

The Markdown-first model has spawned an entire ecosystem of tools that extend or reimplement the pattern. Each one preserves the core principle (files are the source of truth) while adding capabilities.

memsearch, from the Milvus team, extracted OpenClaw’s file-based memory model into a standalone library that works across Claude Code, OpenClaw, OpenCode, and Codex CLI. Conversations in one agent become searchable context in all others. It uses Markdown files as the canonical storage and adds Milvus as a “shadow index” that is always rebuildable from the source files. The three-layer progressive retrieval (search chunks, expand to full sections, drill into raw transcript) means you only pay the cost of deeper retrieval when you need it.

ClawMem combines QMD-derived multi-signal retrieval (BM25 plus vector search plus reciprocal rank fusion plus cross-encoder reranking) with SAME-inspired composite scoring (recency decay, confidence, content-type half-lives, co-activation reinforcement). It integrates via Claude Code hooks, an MCP server, or a native OpenClaw plugin. All paths write to the same local SQLite vault, so a decision captured in a Claude Code session shows up immediately when an OpenClaw agent picks up the same project.

coolmanns/openclaw-memory-architecture pushes the model to its most elaborate extreme. It adds a lossless context engine (LCM) that stores every message in an immutable SQLite store and builds a summary DAG during compaction. Nothing is ever deleted. You can drill into any compressed summary to recover the original messages. The system also includes a knowledge graph (770+ facts with Hot/Warm/Cool decay tiers), domain-specific GraphRAG via LightRAG, and a metacognitive pipeline that extracts facts, detects knowledge gaps, and runs a three-pass contemplation process on overnight cron jobs.

These projects share one thing: they all treat the Markdown files as the source of truth and build sophisticated retrieval on top. The files come first. The infrastructure is optional and disposable.

The Gotcha: When Markdown-First Breaks Down

I have been enthusiastic about this model because it works well in practice. But I would be dishonest if I did not cover where it struggles.

The MEMORY.md budget problem. MEMORY.md is loaded at the start of every session. When it grows past the bootstrap file budget (OpenClaw’s limit for injected context), the system truncates the copy sent to the model while keeping the full file on disk. This means facts that exist in the file are invisible to the agent. The solution is ongoing maintenance: distilling, archiving, and keeping MEMORY.md tight. But maintenance is work, and agents are bad at doing it consistently.

The selective memory problem. As I described with the memory flush, compaction creates an unavoidable information loss. What the model decides to save in that single silent turn is what survives. Everything else is gone. There is no audit trail for what was forgotten.

The concurrent access problem. Multiple agents writing to the same MEMORY.md simultaneously will produce merge conflicts. OpenClaw handles this for single-agent use cases, but multi-agent setups need coordination (file locks, agent-specific memory files, or a shared backend like Mem0).

The scaling problem. Daily logs grow linearly. After a year of heavy use, you might have 365 log files containing thousands of individual entries. Search performance degrades gracefully (the SQLite index handles millions of rows), but the sheer volume of data means retrieval becomes noisier. The dreaming system helps by promoting important items out of the logs and into structured MEMORY.md, but it is not a complete solution.

The semantic gap. Markdown is great for human-readable facts and decisions. It is less great for capturing the relationships between those facts. “User prefers TypeScript” is easy to write down. “User prefers TypeScript because the team had a bad experience with Python type hints in 2024, which was caused by a specific library that has since been fixed” is harder to capture in a flat file. This is where graph-based memory systems have an advantage, and why the most sophisticated setups (like coolmanns’ architecture) add a knowledge graph alongside the Markdown files.

None of these problems invalidate the model. They are the same problems every memory system faces, and the Markdown-first approach actually makes most of them easier to debug and fix because you can see exactly what the agent has stored.

Practical Takeaways

If you are building or configuring an agent memory system, here is what the OpenClaw model teaches us:

Files are the right default. Start with Markdown files. Add databases, vector indices, and retrieval pipelines only when the files alone are insufficient. You can always add infrastructure. You cannot easily remove it once agents depend on it.
Two layers beat one. Separate raw daily logs from curated long-term memory. This forces the agent to make explicit promotion decisions rather than treating everything equally. The daily log is the write-optimized layer. MEMORY.md is the read-optimized layer.
Make the index disposable. Whether you use SQLite FTS5, Milvus, ChromaDB, or LanceDB, design your system so the search index can be rebuilt from the source files. This eliminates entire categories of data corruption and makes migration painless.
Use the slot pattern. Abstract your memory backend behind a swappable interface. The ability to switch between memory-core, Mem0, QMD, or LanceDB without changing the agent’s behavior or losing data is powerful. Your memory files should outlive any specific retrieval implementation.
Budget your always-loaded context. MEMORY.md is loaded every session. Treat it like RAM: precious, limited, and worth curating. If it grows beyond a few hundred lines, you are doing it wrong.
Automate consolidation, but keep human review. The dreaming system is a good pattern: let the machine propose promotions, but let the human approve them. Fully automatic memory management tends toward either noise (too much promoted) or amnesia (too little).
Flush before you compact. Whatever memory system you use, make sure the agent has a chance to persist important context before the conversation history gets summarized. The memory flush pattern is not OpenClaw-specific. It is a necessary safety net for any agent that compresses its context window.

What’s Next

The Markdown-first model has proven that simplicity scales. But the frontier is moving toward systems that combine the transparency of files with the reasoning power of structured memory. In the next post, I will look at the tools and techniques that sit on top of this file-based foundation, specifically how projects like QMD combine local-first search with GGUF-based reranking in a single binary, and why that matters for agents running on constrained hardware.

Previously in this series: Memory as Files: Why Plain Text on Disk Is a Feature, Not a Limitation