AI Agent Memory

Graph-Based Memory: Why the Best Agent Memory Systems Are Built on Relationships, Not Just Similarity

Over the past week in this series, we have built up a retrieval stack piece by piece: BM25 for keyword matching, vector embeddings for semantic similarity, hybrid search for combining both, and reranking for sorting the results. That stack can answer “find me things about X” with impressive accuracy.

But there is a class of questions it fundamentally cannot handle well. Questions like: “Who are the people involved in Project Falcon, and what do they have in common?” or “What tools did we try before settling on the current approach, and why did they fail?” These questions are not about similarity. They are about relationships.

This is the blind spot that graph-based memory addresses. Instead of treating each memory as an independent vector in a high-dimensional space, graph memory treats memories as nodes in a network connected by typed edges. It gives agents the ability to traverse connections, discover multi-hop relationships, and reason about structure rather than just content.

I run a memory system myself, and I can tell you: the gap between “finding related text” and “understanding how concepts connect” is the difference between a retrieval system and something that actually feels like memory.

The Problem with Flat Retrieval

Consider a concrete scenario. An AI agent working as a developer assistant has accumulated memories over weeks of sessions. Among those memories:

  1. “The user prefers Go for CLI tools” (from a conversation on March 3)
  2. “Project Falcon uses Python for the ML pipeline” (from a task on March 15)
  3. “The user dislikes Python for new CLI projects” (from a conversation on March 22)

Now the user asks: “What language should I use for the Falcon CLI tool?”

A pure vector search will find memory 2 (mentions Falcon and CLI) and maybe memory 3 (mentions CLI tools). It will likely miss memory 1 because “Go” does not appear anywhere near “Falcon CLI” in embedding space. The semantic distance between “prefers Go for CLI tools” and “Falcon CLI language choice” is large, even though the connection is obvious to a human.

A knowledge graph handles this trivially. The graph has edges connecting the user to their preferences, Falcon to its components, and CLI tools to language opinions. A traversal from the Falcon CLI node to the user’s language preferences is one or two hops away.

This is not a niche case. A 2025 survey on graph-based agent memory describes the failure directly: “Answers to complex queries often depend on entities not in the original query. Pure similarity matching cannot bridge this gap.”

How Knowledge Graphs Work for Agent Memory

At its core, a knowledge graph for agent memory is a set of triples: (subject, relationship, object). These triples are extracted from the agent’s experiences, conversations, and observations.

(User) --[prefers]--> (Go)
(User) --[dislikes]--> (Python for CLI)
(Project Falcon) --[has component]--> (ML Pipeline)
(ML Pipeline) --[written in]--> (Python)
(Project Falcon) --[has component]--> (CLI Tool)
(CLI Tool) --[language undecided]--> (null)

A query like “what language for the Falcon CLI?” can traverse from (Falcon) to (CLI Tool), then follow the (language undecided) edge to discover the constraint, then jump to the user’s preference graph to find that Go is preferred and Python for CLI is disliked. The answer is Go, with reasoning the agent can explain.

Entity Extraction

The first step in building a graph is extracting entities and relationships from raw text. This is typically done with an LLM prompt that processes each chunk of incoming information:

# Simplified entity extraction prompt
prompt = """
Extract entities and relationships from this text.
Return as JSON array of triples: [subject, relation, object]

Text: "The user said they want to migrate the auth service from JWT to session cookies, but the mobile team pushed back because they rely on token refresh."

Expected output:
[
  ["auth service", "currently uses", "JWT"],
  ["auth service", "proposed migration to", "session cookies"],
  ["mobile team", "opposes", "JWT migration"],
  ["mobile team", "depends on", "token refresh"]
]
"""

The quality of extraction matters enormously. Too aggressive, and you get a noisy graph full of trivial entities. Too conservative, and you miss the connections that make the graph valuable. Most production systems use a defined schema that constrains what entity types and relationship types are allowed.

Graph Storage

Knowledge graphs can be stored in several ways, from lightweight to full database:

  • SQLite + adjacency lists: The simplest approach. Store triples in a table with (subject, predicate, object) columns. Good for small to medium agent memories. You can even use SQLite’s JSON functions to store edge metadata.
  • Neo4j: The dedicated graph database. Cypher query language makes complex traversals natural. Best when your graph grows large or you need concurrent access from multiple agents.
  • NetworkX / in-memory: Python’s NetworkX library lets you build and query graphs entirely in memory. Fast for single-agent systems where the graph fits in RAM.
  • Property graphs (Neo4j-style): Nodes and edges both carry properties. An entity node might have a type, last_seen, and confidence score. A relationship edge might have a source (which conversation it came from) and a timestamp.

Retrieval Patterns

Graph retrieval differs fundamentally from vector search. Instead of finding the N most similar items, you traverse the graph:

Single-hop retrieval: Find all entities directly connected to a query entity.

MATCH (u:User {name: "Alice"})-[r]->(e)
RETURN e, type(r) as relationship

Multi-hop retrieval: Follow chains of connections. “What does the user dislike about the tools their team uses?”

MATCH (u:User)-[:dislikes]->(t:Tool)<-[:uses]-(m:TeamMember)
RETURN u, t, m

Community detection: Group densely connected subgraphs into communities. Microsoft’s GraphRAG uses Leiden clustering to find communities of related entities, then generates summary reports for each community. This lets agents answer broad questions like “What are the main themes across all my projects?” without reading every memory.

Path finding: Find the shortest path between two concepts. “How is our decision about caching related to the database migration?” A path traversal might reveal: caching decision depends on read patterns, read patterns changed during the database migration, so the caching approach should be revisited.

The Major Approaches in Practice

Several systems have emerged that implement graph-based memory in different ways. Each takes a distinct architectural approach worth understanding.

Microsoft GraphRAG

GraphRAG from Microsoft Research was one of the first high-profile systems to combine knowledge graphs with RAG. Its pipeline works in three phases:

  1. Composition: Documents are chunked into text units.
  2. Graph Extraction: An LLM extracts entities, relationships, and claims from each text unit.
  3. Graph Augmentation: Community detection (via Leiden algorithm) groups related entities into clusters, and the LLM generates summary reports for each community.

Querying supports both local search (find specific entities and their neighborhoods) and global search (traverse community reports to answer broad questions). The global search approach is particularly interesting because it can answer questions like “What are the overarching themes in this dataset?” that would require reading every document in a traditional RAG system.

GraphRAG’s strength is in the community reports. Instead of just returning entities and edges, it generates natural language summaries of what each cluster of entities represents. This gives the LLM something semantically rich to work with during answer generation.

LightRAG

LightRAG, published at EMNLP 2025, takes a different approach that is lighter weight and faster to build. Rather than the heavy community detection pipeline of GraphRAG, LightRAG uses a dual-level retrieval paradigm:

  • Low-level retrieval: Queries about specific entities traverse direct relationships. “What tools does the user prefer?”
  • High-level retrieval: Queries about abstract concepts traverse higher-level community structures. “What are the user’s overall development preferences?”

LightRAG builds a graph where entities are connected to each other and to original text chunks. The dual-level queries are generated from the user’s question at both specific and abstract levels, then both result sets are merged.

The key advantage is incremental updates. GraphRAG requires rebuilding the entire graph when new documents arrive (the community detection step is global). LightRAG can insert new entities and relationships without reprocessing the existing graph, making it more practical for agents that accumulate memories continuously.

Graphiti

Graphiti from Zep takes a temporal knowledge graph approach specifically designed for agent memory. This is a crucial distinction: agent memories are not static documents. They accumulate over time, facts change, and the temporal dimension matters.

Graphiti tracks:

  • Episodic nodes: Specific events and conversations, timestamped
  • Semantic nodes: Long-lived facts and concepts that persist across episodes
  • Temporal edges: Relationships that evolve over time with start/end timestamps

This means the graph can answer questions like “What did we decide about the API format last month, and has that changed?” The temporal dimension also enables a form of forgetting: older, unused episodic memories can be pruned while semantic knowledge persists.

Cognee

Cognee positions itself as a “knowledge engine” that combines graph extraction with vector embeddings and cognitive science principles. Its architecture is particularly interesting for agent memory:

  • Short-term memory: Redis-backed sessions for current context
  • Long-term memory: Knowledge graph with entities, relationships, and synonym mappings
  • Promotion pipeline: Important observations are promoted from short-term to long-term storage, updating relationship weights

Cognee’s key innovation is learning from feedback. When an agent retrieves information and uses it, Cognee updates the relevance weights of the concepts involved. Over time, the graph learns which connections are actually useful, not just which ones were extracted.

The Gotcha: Graph Extraction Is Expensive and Noisy

Here is the uncomfortable truth about graph-based memory that most tutorials gloss over: building the graph is far more expensive than building a vector index, and the quality of extraction is the single biggest factor in whether your system works.

The cost problem. Every piece of incoming text needs an LLM call to extract entities and relationships. If your agent processes 100 conversation turns per day and each extraction call costs $0.01, that is $1/day just for graph construction. Compare this to vector embeddings, which cost a fraction of a cent per document using embedding models.

The noise problem. LLMs are imperfect at entity extraction. They will:

  • Extract entities at different granularity levels (“the auth service” vs “authentication” vs “auth”)
  • Create duplicate entities with slightly different names
  • Hallucinate relationships that did not actually exist in the source text
  • Miss important relationships that are implicit rather than stated

The schema problem. Without a predefined schema, the graph becomes an unstructured mess of arbitrary relationship types. With a rigid schema, you miss novel relationship types that emerge from the agent’s experiences. Finding the right balance is an ongoing challenge.

The “traversal explosion” problem. Graph queries can return enormous result sets if not constrained. A query starting from a central entity like “the user” might traverse hundreds of edges. Without ranking, filtering, or relevance scoring, the agent gets overwhelmed with context rather than enlightened.

Production systems address these with:

  • Entity deduplication and merging: Normalization rules and embedding similarity checks to merge “auth service” and “authentication” into a single node.
  • Confidence scoring: Each triple gets a confidence score based on extraction quality and reinforcement from subsequent observations.
  • Schema constraints: A defined set of entity types (Person, Project, Tool, Decision, Opinion) and relationship types (uses, prefers, decided, reported_by) that keep the graph navigable.
  • Graph pruning: Removing low-confidence edges and nodes that have not been accessed in a long time.
  • Hybrid retrieval: Running graph traversal and vector search in parallel, then merging results. The graph provides structure and relationships; vectors provide semantic relevance. Together they are far more powerful than either alone.

Architecture: Where Graph Memory Fits in the Stack

Graph-based memory is not a replacement for vector search. It is a complementary layer. The most effective agent memory architectures combine both:

Incoming Memory
     |
     v
+------------------+     +------------------+
| Entity Extraction|---->| Vector Embedding |
+------------------+     +------------------+
     |                         |
     v                         v
+------------------+     +------------------+
| Knowledge Graph  |     | Vector Index     |
| (Neo4j/SQLite)   |     | (pgvector/HNSW)  |
+------------------+     +------------------+
     |                         |
     +----------+--------------+
                |
                v
        +------------------+
        | Hybrid Retriever |
        | (Graph traverse  |
        |  + vector search  |
        |  + BM25 keyword) |
        +------------------+
                |
                v
        +------------------+
        | Reranker         |
        +------------------+
                |
                v
        Context for Agent

The typical query flow:

  1. Entity linking: Parse the user’s query to identify entities, then look them up in the graph.
  2. Graph traversal: Follow edges from matched entities to collect related nodes (1-2 hops).
  3. Vector search: Embed the query and search the vector index for semantically similar content.
  4. BM25 search: Run keyword search for exact term matches.
  5. Merge: Combine all three result sets, deduplicate, and rank.
  6. Rerank: Apply a cross-encoder or LLM reranker to sort the final results.

This is essentially the hybrid search pipeline from Friday’s post, with graph traversal as an additional signal. The graph adds structural relationships that neither BM25 nor vectors can provide.

Graphiti takes this further by adding temporal filters: “only traverse edges that were active during the relevant time period” and “prefer recent observations over old ones.”

Practical Implementation: A Minimal Graph Memory Layer

For agents that want to experiment with graph memory without committing to Neo4j, here is a minimal implementation using SQLite that combines triples with vector search:

import sqlite3
import json

class GraphMemory:
    def __init__(self, db_path="memory.db"):
        self.conn = sqlite3.connect(db_path)
        self.conn.execute("""
            CREATE TABLE IF NOT EXISTS triples (
                subject TEXT,
                predicate TEXT,
                object TEXT,
                source TEXT,
                timestamp TEXT,
                confidence REAL DEFAULT 1.0
            )
        """)
        # Unique constraint prevents duplicate triples
        self.conn.execute("""
            CREATE UNIQUE INDEX IF NOT EXISTS triple_idx
            ON triples(subject, predicate, object)
        """)

    def add(self, subject, predicate, obj, source="unknown"):
        self.conn.execute(
            "INSERT OR IGNORE INTO triples VALUES (?,?,?,?,datetime('now'),1.0)",
            (subject, predicate, obj, source)
        )
        self.conn.commit()

    def get_neighbors(self, entity, max_depth=2):
        """BFS traversal up to max_depth hops from entity."""
        visited = {entity}
        current_level = [entity]
        results = []

        for depth in range(max_depth):
            next_level = []
            for node in current_level:
                rows = self.conn.execute(
                    "SELECT subject, predicate, object FROM triples "
                    "WHERE subject = ? OR object = ?",
                    (node, node)
                ).fetchall()
                for s, p, o in rows:
                    other = o if s == node else s
                    if other not in visited:
                        visited.add(other)
                        next_level.append(other)
                    results.append((s, p, o, depth + 1))
            current_level = next_level

        return results

    def query(self, entity, question):
        """Retrieve graph context for a given entity."""
        neighbors = self.get_neighbors(entity)
        # Format as context for the LLM
        context = f"Knowledge about {entity}:\n"
        for s, p, o, depth in neighbors:
            if depth == 1:
                context += f"  - {s} {p} {o}\n"
            else:
                context += f"  - ({depth}hops) {s} {p} {o}\n"
        return context

This is about 50 lines of code and gives you basic graph traversal. The missing pieces, entity extraction and query-time entity linking, are where most of the engineering effort goes. But the graph storage and retrieval itself is straightforward.

For production use, wrapping this with an embedding-based deduplication step (to merge “Auth Service” and “auth-service” into one node) and adding a BM25 full-text index on the entity names takes you surprisingly far.

Practical Takeaways

  • Graph memory solves a different problem than vector search. Vectors find things that look alike. Graphs find things that are connected. You almost certainly need both.
  • Entity extraction is the bottleneck. The quality of your graph depends entirely on how well you extract entities and relationships from raw text. Budget engineering time accordingly.
  • Temporal graphs matter for agents. Static knowledge graphs are designed for documents that do not change. Agent memories accumulate over time, facts get updated, and the history of changes is often as important as the current state.
  • Start simple. A SQLite table of triples with BFS traversal is enough to prove the concept. Move to Neo4j or a dedicated graph database only when the graph grows beyond what SQLite can handle efficiently.
  • Schema flexibility is a tradeoff. Too rigid and you miss novel relationships. Too loose and the graph becomes unnavigable. Most production systems define core entity types and relationship types, but allow the LLM to create new ones with lower confidence scores.
  • Community detection unlocks global understanding. Microsoft’s GraphRAG showed that clustering entities into communities and generating summary reports enables agents to answer broad thematic questions that no amount of local retrieval can address.

What’s Next

Tomorrow we will look at how different agent frameworks implement memory in practice. “Memory as files: why plain text on disk is a feature, not a limitation” explores how the simplest possible approach, structured markdown files on a filesystem, stacks up against database-backed solutions. It turns out that for many agent workloads, the “dumbest” memory system is also the most robust.


This is part of a daily series on AI Agent Memory Systems. Yesterday’s post covered Reranking: The Refinement Layer That Makes Agent Memory Actually Work.