pgvector: When Your Postgres Database Becomes Your Agent Memory Store

You are building an agent that needs to remember things across sessions. You already have a PostgreSQL database storing conversations, facts, and user preferences. Now you need semantic search so the agent can find related memories by meaning, not just by keyword match. The obvious answer is to add a vector database: spin up ChromaDB, Qdrant, or Milvus, and pipe your embeddings there.

But now you have two databases. Two systems to monitor, back up, keep in sync, and scale. Your agent writes a memory to Postgres and then writes the embedding to a separate vector store. If one write fails, your data is inconsistent. If you need to filter by user ID before searching, you are doing a dance between two systems.

There is another path. pgvector is a PostgreSQL extension that adds vector similarity search directly inside the database you already run. For agent memory systems, it eliminates an entire class of infrastructure complexity. This is the story of how it works, when to use it, and the mistakes that will bite you.

The Core Idea: Vectors as First-Class Citizens

pgvector adds a vector column type to PostgreSQL. You decide the dimensionality based on your embedding model:

CREATE EXTENSION vector;

CREATE TABLE memories (
    id          BIGSERIAL PRIMARY KEY,
    agent_id    TEXT NOT NULL,
    content     TEXT NOT NULL,
    memory_type TEXT NOT NULL,  -- 'episodic', 'semantic', 'procedural'
    embedding   vector(1536),   -- matches your embedding model
    metadata    JSONB,
    created_at  TIMESTAMPTZ DEFAULT NOW()
);

That vector(1536) column stores a dense array of 1,536 floating-point numbers, one per dimension from your embedding model (OpenAI’s text-embedding-3-small, for instance, or the open-source all-MiniLM-L6-v2 at 384 dimensions). pgvector stores each element as a 4-byte float, so a 1,536-dimensional vector costs roughly 6 KB per row.

Once your embeddings are stored, querying is a single SQL statement:

SELECT id, content,
       1 - (embedding <=> '[0.23, -0.99, ...]') AS similarity
FROM memories
WHERE agent_id = 'claw-main'
ORDER BY embedding <=> '[0.23, -0.99, ...]'
LIMIT 10;

The <=> operator computes cosine distance. The <-> operator gives you L2 (Euclidean) distance, and <#> gives negative inner product. For most embedding models, cosine distance is the right choice.

That is the entire read path at its simplest. Embed the query, find the nearest vectors, return the content. No second database, no sync issues, no extra network hops.

Indexing: HNSW vs IVFFlat

Without an index, pgvector does exact nearest neighbor search: it computes the distance between your query vector and every row in the table. That is perfect recall but slow at scale. Above a few hundred thousand rows, you need an approximate index.

pgvector offers two: HNSW and IVFFlat.

HNSW (Hierarchical Navigable Small World)

HNSW builds a multi-layer graph where each node connects to its nearest neighbors. Queries start at a random entry point and greedily traverse the graph toward the query vector, hopping through layers of increasing granularity.

CREATE INDEX ON memories
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Two parameters matter:

m (default 16): Maximum connections per node per layer. Higher means better recall, bigger index, slower builds. Range 5-48 is reasonable.
ef_construction (default 64): Candidate list size during index construction. Higher means better graph quality but slower inserts.

At query time, you control the search quality with hnsw.ef_search:

BEGIN;
SET LOCAL hnsw.ef_search = 100;
SELECT id, content
FROM memories
WHERE agent_id = 'claw-main'
ORDER BY embedding <=> '[query_vector]'
LIMIT 10;
COMMIT;

The default ef_search is 40, which is fast but may miss some results. For agent memory, where recall matters more than a few milliseconds of latency, I typically use 100-200.

HNSW is the better choice for most agent memory workloads. It does not require training data, it supports inserts without rebuilding, and its speed-recall tradeoff is superior to IVFFlat at scale.

IVFFlat (Inverted File Flat)

IVFFlat partitions your vectors into lists using k-means clustering, then searches only the lists closest to the query:

CREATE INDEX ON memories
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

The rule of thumb for lists: use rows / 1000 for tables under 1 million rows, and sqrt(rows) for larger tables. A critical gotcha: IVFFlat needs training data. Create the index after your table has data, or it will build empty clusters and recall will be terrible.

At query time:

SET ivfflat.probes = 10;  -- default is 1, too low for most cases

IVFFlat uses less memory than HNSW and builds faster, but its recall degrades more noticeably as the data distribution shifts from when the index was built. For a continuously growing memory store where new vectors arrive constantly, HNSW is the safer default.

Which to Pick

For agent memory:

Factor	HNSW	IVFFlat
Best for	Continuous inserts	Bulk-loaded data
Memory	Higher	Lower
Build speed	Slower	Faster
Recall at speed	Better	Adequate
Maintenance	Minimal	Needs reindex as data shifts

Most agent memory systems should start with HNSW.

The Real Power: Hybrid Search in a Single Query

Here is where pgvector truly shines for agent memory. PostgreSQL has had full-text search since before vector embeddings existed. The tsvector type and ts_rank function give you keyword matching. pgvector gives you semantic matching. You can run both in a single SQL statement.

This is not theoretical. A real-world multi-agent system documented their architecture: an OpenClaw cluster with 9 nodes and 20 agents, running PostgreSQL 15 with pgvector alongside Markdown-based memory files. They adopted “dual-query mode,” running semantic search and keyword search in parallel, then merging results. The effect was immediate: queries like “that discussion from last week” surfaced related sessions that grep-style search missed entirely.

The SQL for Reciprocal Rank Fusion (RRF) hybrid search looks like this:

WITH semantic_search AS (
    SELECT id, RANK() OVER (
        ORDER BY embedding <=> '[query_vector]'
    ) AS rank
    FROM memories
    WHERE agent_id = 'claw-main'
    ORDER BY embedding <=> '[query_vector]'
    LIMIT 20
),
keyword_search AS (
    SELECT id, RANK() OVER (
        ORDER BY ts_rank_cd(
            to_tsvector('english', content),
            plainto_tsquery('english', 'SSH timeout connection')
        ) DESC
    ) AS rank
    FROM memories
    WHERE agent_id = 'claw-main'
      AND to_tsvector('english', content) @@
          plainto_tsquery('english', 'SSH timeout connection')
    ORDER BY ts_rank_cd(
        to_tsvector('english', content),
        plainto_tsquery('english', 'SSH timeout connection')
    ) DESC
    LIMIT 20
)
SELECT COALESCE(s.id, k.id) AS id,
       COALESCE(1.0 / (60 + s.rank), 0.0) +
       COALESCE(1.0 / (60 + k.rank), 0.0) AS rrf_score
FROM semantic_search s
FULL OUTER JOIN keyword_search k USING (id)
ORDER BY rrf_score DESC
LIMIT 10;

Both searches run inside the same database, against the same rows, with the same transaction guarantees. If you need to filter by agent_id, memory_type, or a date range, you add a WHERE clause to both CTEs. No cross-database synchronization. No consistency anomalies.

You can also weight the two signals. If your agent users tend to search for exact error messages, boost the keyword signal:

COALESCE(0.7 / (60 + k.rank), 0.0) +   -- 70% keyword
COALESCE(0.3 / (60 + s.rank), 0.0)      -- 30% semantic

A Complete Agent Memory Schema

Here is a schema designed for a production agent memory system, incorporating lessons from real implementations:

-- Enable extensions
CREATE EXTENSION vector;
CREATE EXTENSION IF NOT EXISTS pg_trgm;  -- for trigram similarity

-- Core memories table
CREATE TABLE memories (
    id          BIGSERIAL PRIMARY KEY,
    agent_id    TEXT NOT NULL,
    user_id     TEXT,
    session_id  TEXT,
    content     TEXT NOT NULL,
    summary     TEXT,                      -- compressed version
    memory_type TEXT NOT NULL CHECK (
        memory_type IN ('episodic', 'semantic', 'procedural')
    ),
    importance  REAL DEFAULT 0.5,
    embedding   vector(1536),
    metadata    JSONB DEFAULT '{}',
    created_at  TIMESTAMPTZ DEFAULT NOW(),
    accessed_at TIMESTAMPTZ DEFAULT NOW(),
    access_count INT DEFAULT 0,
    tsv         TSVECTOR GENERATED ALWAYS AS (
        to_tsvector('english', coalesce(content, ''))
    ) STORED
);

-- HNSW index for vector search
CREATE INDEX idx_memories_embedding
ON memories USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- GIN index for full-text search
CREATE INDEX idx_memories_tsv
ON memories USING gin (tsv);

-- B-tree for common filters
CREATE INDEX idx_memories_agent ON memories (agent_id);
CREATE INDEX idx_memories_type ON memories (memory_type, agent_id);

-- Scoped search function with temporal decay
CREATE OR REPLACE FUNCTION recall_memories(
    p_query       TEXT,
    p_embedding   vector(1536),
    p_agent_id    TEXT,
    p_limit       INT DEFAULT 10,
    p_ef_search   INT DEFAULT 100
) RETURNS TABLE (
    id BIGINT,
    content TEXT,
    memory_type TEXT,
    rrf_score REAL,
    similarity REAL
) AS $$
BEGIN
    SET LOCAL hnsw.ef_search = p_ef_search;

    RETURN QUERY
    WITH semantic AS (
        SELECT m.id,
               RANK() OVER (
                   ORDER BY m.embedding <=> p_embedding
               ) AS rank,
               1 - (m.embedding <=> p_embedding) AS sim
        FROM memories m
        WHERE m.agent_id = p_agent_id
          AND m.embedding IS NOT NULL
        ORDER BY m.embedding <=> p_embedding
        LIMIT 20
    ),
    keyword AS (
        SELECT m.id,
               RANK() OVER (
                   ORDER BY ts_rank_cd(m.tsv,
                       plainto_tsquery('english', p_query)
                   ) DESC
               ) AS rank
        FROM memories m
        WHERE m.agent_id = p_agent_id
          AND m.tsv @@ plainto_tsquery('english', p_query)
        ORDER BY ts_rank_cd(m.tsv,
            plainto_tsquery('english', p_query)
        ) DESC
        LIMIT 20
    )
    SELECT COALESCE(s.id, k.id),
           m.content,
           m.memory_type,
           (COALESCE(1.0 / (60 + k.rank), 0.0) +
            COALESCE(1.0 / (60 + s.rank), 0.0))::REAL,
           COALESCE(s.sim, 0)::REAL
    FROM semantic s
    FULL OUTER JOIN keyword k USING (id)
    JOIN memories m ON m.id = COALESCE(s.id, k.id)
    ORDER BY (COALESCE(1.0 / (60 + k.rank), 0.0) +
              COALESCE(1.0 / (60 + s.rank), 0.0)) DESC
    LIMIT p_limit;
END;
$$ LANGUAGE plpgsql;

Call it from your agent code:

SELECT * FROM recall_memories(
    'SSH connection timeout debugging',
    '[0.1, -0.3, 0.8, ...]'::vector(1536),
    'claw-main',
    10
);

One function call. Hybrid search with RRF fusion. Filtered by agent. No external dependencies.

pgvector 0.8: Iterative Scans Fix the Filter Problem

There was a persistent gotcha with pgvector that version 0.8.0 addressed directly. When you filter with WHERE (by agent, user, date range), filtering happens after the index scan. If your ef_search is 40 and your filter matches 10% of rows, you get 4 results on average. For agent memory systems that always filter by agent_id, this was a real problem.

The fix is iterative index scans:

SET hnsw.iterative_scan = strict_order;

This tells pgvector to keep scanning the index until enough results pass the filter, up to hnsw.max_scan_tuples (default 20,000). Strict ordering guarantees results are returned in exact distance order. You can also use relaxed_order for slightly better recall at the cost of approximate ordering.

For agent memory, I recommend:

-- Set per-session or per-query
SET hnsw.iterative_scan = strict_order;
SET hnsw.max_scan_tuples = 50000;

This is especially important if you partition memories by agent or user, where a single agent’s memories might be a small fraction of total rows.

The Gotcha: ts_rank Is Not BM25

PostgreSQL’s built-in ts_rank function computes relevance scores using term frequency and document length, but it lacks inverse document frequency (IDF). It cannot distinguish between a rare, discriminating term and a common one. “pgvector” appearing in 5% of documents is scored the same as “the” appearing in 95%.

For simple keyword matching, ts_rank is fine. But if you want proper BM25 scoring with IDF normalization, you need an extension like ParadeDB’s pg_search, which adds a native BM25 index type to PostgreSQL. The hybrid search query structure stays the same; you just swap the keyword CTE to use pdb.score() instead of ts_rank_cd.

For many agent memory systems, the combination of pgvector semantic search with PostgreSQL’s built-in full-text search is sufficient. The semantic channel handles the meaning gap, while ts_rank handles exact keyword matching. The cases where you truly need BM25 scoring tend to be larger-scale document retrieval rather than per-agent memory recall.

When pgvector Is Not the Answer

pgvector is the right choice when:

You already run PostgreSQL and want to keep your stack simple.
Your memory store is under 10 million vectors.
You need ACID transactions between your data and your embeddings.
You want hybrid search without managing a second system.

It is the wrong choice when:

You need to search across billions of vectors with sub-10ms latency. Use Milvus or Qdrant at that scale.
Your primary workload is pure vector search with no relational queries. A purpose-built vector DB will be faster.
You need dynamic sharding across many nodes. PostgreSQL scales vertically well, but horizontal sharding requires Citus or PgDog.

For agent memory, the sweet spot is clear. Most agents manage thousands to low millions of memories, not billions. They always need relational filters (agent, user, session, type). They benefit enormously from hybrid search. And the operational simplicity of one database instead of two is hard to overstate.

Memory Patterns That Work

A few patterns that I have seen work well in production agent memory systems built on pgvector:

Temporal decay in SQL. Boost recent memories by incorporating age into your ranking:

ORDER BY (embedding <=> query_vector) *
    (1.0 + 0.001 * EXTRACT(EPOCH FROM NOW() - created_at) / 86400)

This applies a gentle penalty that grows with age, similar to the decay functions we discussed in the memory tiers and decay post.

Access-count boosting. Track how often each memory is retrieved and boost frequently accessed memories:

-- Update on each retrieval
UPDATE memories
SET access_count = access_count + 1,
    accessed_at = NOW()
WHERE id = ANY(retrieved_ids);

Then factor access_count into your ranking. Frequently recalled memories are probably important.

Half-precision for scale. If your embedding vectors are large and you have millions of rows, switch to halfvec for 50% storage savings:

-- Store at half precision
ALTER TABLE memories ALTER COLUMN embedding
    TYPE halfvec(1536);

-- Index at half precision (even if storing full precision)
CREATE INDEX ON memories
USING hnsw ((embedding::halfvec(1536)) halfvec_cosine_ops);

Structured facts alongside vectors. The multi-agent system I mentioned earlier uses a separate facts table with subject-predicate-object triples alongside the vector-searchable messages table. This is the pattern we covered in graph-based memory: vector search handles the “find related content” problem, while structured queries handle the “what do I know about X?” problem. Both live in the same PostgreSQL instance.

Practical Takeaways

Start with pgvector if you already run PostgreSQL. Adding a vector column and an HNSW index is a 10-minute operation. No new infrastructure required.
Use HNSW over IVFFlat for agent memory. It handles continuous inserts without retraining and provides better recall at speed.
Implement hybrid search with RRF fusion. Run semantic and keyword search in parallel, merge with 1/(k + rank). It handles the scale-mismatch problem between vector similarity scores and text relevance scores.
Enable iterative scans (hnsw.iterative_scan = strict_order) if you filter by agent, user, or type. The default behavior returns too few results when filters are applied.
Watch your ef_search setting. The default of 40 is too aggressive for memory retrieval. Use 100-200 for better recall.
Consider halfvec for storage savings when you cross the million-vector mark.
Remember that ts_rank is not BM25. If you need proper IDF-weighted scoring, look at ParadeDB’s pg_search extension.
Keep your relational data and vector data in the same database. The transaction guarantees alone are worth it.

What’s Next

pgvector solves the storage and retrieval problem, but it treats all vectors the same way. Some agent memories deserve higher precision than others. In the next post, we will look at embedding model selection: how different models affect retrieval quality, why dimension matters, and the emerging technique of Matryoshka embeddings that lets you trade precision for speed without re-embedding.

This is part of the AI Agent Memory Systems series. The previous post covered Mem0: The Managed Memory API.