PostgreSQL Deep dive: The visibility map — the 2-bit data structure that makes index-only scans possible
Every heap relation in PostgreSQL has a silent partner: a tiny file called the Visibility Map (VM). It stores exactly two bits per heap page, yet those two bits are the difference between an index scan that reads the index and the table, and an index-only scan that reads only the index. On a 100GB table, that can mean the difference between 100,000 random I/O operations and zero.
Today we’re looking at how the VM works, what sets and clears its bits, and why understanding it is essential for anyone who cares about query performance.
The Two Bits
The Visibility Map stores two bits per heap page:
| Bit | Name | Meaning |
|---|---|---|
| Bit 0 | All-visible | Every tuple on this page is visible to all active transactions. No dead tuples exist. |
| Bit 1 | All-frozen | Every tuple on this page has been frozen. No anti-wraparound vacuum will ever need to revisit this page. |
That’s it. Two bits. Stored in a separate fork file with a _vm suffix (e.g., 12345_vm alongside the main relation file 12345). The VM file is roughly 1/32768th the size of the main relation: each VM page holds 8192 bytes, each byte stores 4 pages (2 bits each), so one VM page covers 32,768 heap pages (256MB of heap data).
For a 100GB table, the VM is about 3MB. It fits entirely in memory. It almost always stays cached. This is why index-only scans work.
How All-Visible Enables Index-Only Scans
To understand why the VM matters, you need to understand the problem it solves.
In PostgreSQL, all indexes are secondary indexes. The index stores copies of indexed column values along with a CTID (a pointer to the heap page and line pointer where the full row lives). When a regular index scan finds a matching entry, it must then fetch the full heap tuple for two reasons:
- To get columns not in the index
- To verify MVCC visibility: is this tuple visible to my transaction snapshot?
For query (1), covering indexes with INCLUDE columns solve the problem by storing additional columns in the index. But for query (2), visibility information lives in the heap tuple header, not in the index. Every index scan should need to visit the heap.
The VM provides the escape hatch. When an index-only scan finds a candidate index entry, it checks the VM bit for the corresponding heap page:
- Bit set (all-visible): The row is guaranteed visible. Return the data directly from the index entry. No heap access needed.
- Bit not set: Fall back to a regular index scan. Visit the heap to check visibility.
The VM is ~10,000 times smaller than the heap it describes. Checking a VM bit is a single page read from a file that’s almost certainly cached in shared_buffers or the OS page cache. Comparing that to a random heap page read (which on a cold cache means disk I/O) is the performance difference between “fast” and “why is this query slow.”
How Bits Get Set: VACUUM Is the Only Writer
Visibility map bits are only set by VACUUM. No other operation sets them.
When VACUUM processes a heap page:
- It scans the page for dead tuples (tuples with xmax < the oldest active transaction)
- It removes dead tuples, compacts the page, and updates the line pointer array
- After cleanup, if every remaining tuple on the page is visible to all current and future transactions, VACUUM sets the all-visible bit
- If every tuple is also frozen (xmin is sufficiently old), VACUUM sets the all-frozen bit
This means the all-visible bit reflects the state of the page as of the last VACUUM. If VACUUM hasn’t run recently, pages that are actually all-visible won’t have their bits set.
How Bits Get Cleared: Any Data Modification
Visibility map bits are cleared by any data-modifying operation on a page. This is critical and often misunderstood.
The moment you do any of these, the all-visible bit for the affected page is cleared:
-- Any of these will clear the all-visible bit for the target page(s)
UPDATE users SET last_login = NOW() WHERE id = 42;
DELETE FROM events WHERE id = 100;
INSERT INTO events (data) VALUES ('...'); -- if page was all-visible, bit is cleared
This is because a modification invalidates the guarantee that “all tuples on this page are visible to all transactions.” An UPDATE creates a new tuple version. A DELETE marks a tuple as dead. Even an INSERT on a previously empty all-visible page means the page now contains a tuple that might not be visible to all transactions (the inserting transaction might still be active).
This clearing behavior has a surprising consequence: tables with frequent updates will have their all-visible bits cleared faster than VACUUM can set them. The VM becomes a patchwork of set and cleared bits, and index-only scans fall back to regular index scans for many pages.
The Gotcha: Index-Only Scans That Aren’t
This is the most common visibility map pitfall. You create a beautiful covering index, check your EXPLAIN output, and see “Index Only Scan.” You think you’re winning. But look closer at the BUFFERS output:
EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*) FROM orders WHERE status = 'completed';
Index Only Scan using idx_orders_status on orders
(cost=0.42..12543.21 rows=500000 width=0)
(actual time=0.032..89.456 rows=487231 loops=1)
Buffers: shared read=154
Wait. shared read=154 on an index-only scan? That shouldn’t happen if the VM bits are all set. Those reads are heap page fetches for pages whose all-visible bit was not set.
Compare with a true index-only scan:
Index Only Scan using idx_orders_status on orders
(cost=0.42..12543.21 rows=500000 width=0)
(actual time=0.018..23.781 rows=487231 loops=1)
Buffers: shared hit=8
shared hit=8. Eight VM page reads, zero heap reads. That’s the difference between “sort of index-only” and “actually index-only.”
The diagnostic: add Heap Fetches to your observation. PostgreSQL reports the number of heap fetches that an index-only scan had to perform:
EXPLAIN (ANALYZE)
SELECT count(*) FROM orders WHERE status = 'completed';
If you see Heap Fetches: 0, the VM was fully effective. Every page was all-visible. If you see Heap Fetches: 15432, nearly 15,000 pages needed heap access because their VM bits weren’t set.
The Conservative Guarantee
The VM is intentionally conservative. When a bit is set, the condition is guaranteed true. When a bit is not set, the condition might or might not be true. The VM never falsely claims a page is all-visible when it isn’t.
This conservatism is what makes it safe for index-only scans. If the VM mistakenly said a page was all-visible when it actually contained dead tuples or invisible versions, the scan would return incorrect results. By only clearing bits eagerly and setting them conservatively (only after VACUUM has verified), PostgreSQL guarantees correctness.
The trade-off is that the VM is often less accurate than reality. Many pages that are actually all-visible won’t have their bits set yet because VACUUM hasn’t visited them since the last modification. This is why freshly updated tables see poor index-only scan performance even when the update rate is low.
All-Frozen: The Anti-Wraparound Bit
The all-frozen bit is the second bit in the VM, and it has a different job: skip pages during anti-wraparound vacuum.
When VACUUM runs in anti-wraparound mode (triggered when a table approaches autovacuum_freeze_max_age transactions since its last freeze), it must scan every page to freeze old tuple headers. This prevents transaction ID wraparound, which would make the database stop accepting writes.
If the all-frozen bit is set for a page, anti-wraparound vacuum can skip it entirely. This is a significant optimization for large tables with mostly-frozen data. A table with 90% of its pages all-frozen only needs to scan the remaining 10% during anti-wraparound vacuum.
The all-frozen bit implies all-visible (you can’t have a frozen tuple that isn’t visible), but the reverse is not true. A page can be all-visible without being all-frozen if the tuples are visible but their xmin values are recent enough to need freezing eventually.
The VM and HOT Updates
Heap-Only Tuple (HOT) updates are the VM’s best friend. A HOT update occurs when an UPDATE modifies only non-indexed columns and the new tuple version fits on the same page. The index doesn’t need a new entry. The old tuple is marked as dead, and the new version points back to the same index entry via the line pointer chain.
HOT updates matter for the VM because:
- They don’t clear index entries, so the index remains consistent
- They keep the update confined to a single page, so only that page’s VM bit is cleared
- VACUUM can clean up HOT chains and potentially re-set the all-visible bit in a single pass
Tables with many HOT-friendly updates (updates that only modify non-indexed columns) will see their VM bits recover quickly after VACUUM. Tables with updates that modify indexed columns will create new index entries across potentially different pages, clearing multiple VM bits and making recovery slower.
This is why fillfactor matters. A lower fillfactor reserves space on each page for HOT updates, reducing page splits and keeping updated tuples on the same page. If you have a table with frequent updates to a few columns, making those non-indexed and setting fillfactor=90 can dramatically improve VM bit recovery and index-only scan effectiveness.
The VM and Autovacuum Tuning
Understanding the VM changes how you think about autovacuum tuning. The question isn’t just “how often should vacuum run to clean up dead tuples?” It’s also “how quickly can vacuum restore the all-visible bits so my index-only scans work?”
Key relationships:
- More aggressive autovacuum means VM bits get set sooner after modifications, which means index-only scans degrade for less time.
- Higher
autovacuum_vacuum_cost_delayor lowerautovacuum_vacuum_cost_limit** means vacuum runs slower, taking longer to process pages and set VM bits. - Long-running transactions block VM bit setting. VACUUM cannot mark a page as all-visible if there’s any transaction that might still be able to see older tuple versions on that page. A transaction that’s been open for 6 hours prevents VACUUM from setting all-visible bits on any page modified since that transaction started.
That last point is insidious. A forgotten BEGIN without a COMMIT, an abandoned psql session, or a connection pooler holding a transaction open will silently prevent VM bit recovery across your entire database. Your index-only scans degrade, but there’s no error message. The vacuum logs don’t mention it directly.
Practical SQL: Inspecting the Visibility Map
The pg_visibility extension provides direct access to VM data:
CREATE EXTENSION IF NOT EXISTS pg_visibility;
Check VM bits for a table
SELECT
blkno,
all_visible,
all_frozen,
pd_all_visible -- what the page header thinks (may differ from VM)
FROM pg_visibility('orders')
ORDER BY blkno
LIMIT 50;
VM coverage summary
WITH vm AS (
SELECT
count(*) AS total_pages,
count(*) FILTER (WHERE all_visible) AS all_visible_pages,
count(*) FILTER (WHERE all_frozen) AS all_frozen_pages
FROM pg_visibility('orders')
)
SELECT
total_pages,
all_visible_pages,
all_frozen_pages,
round(100.0 * all_visible_pages / total_pages, 1) AS pct_all_visible,
round(100.0 * all_frozen_pages / total_pages, 1) AS pct_all_frozen
FROM vm;
For index-only scans to be effective, you want pct_all_visible to be high (ideally above 95%). If it’s below 50%, your index-only scans are spending most of their time doing heap fetches.
Find pages where VM doesn’t match reality
SELECT
blkno,
all_visible AS vm_says_all_visible,
pd_all_visible AS page_says_all_visible,
all_frozen AS vm_says_all_frozen,
pd_all_frozen AS page_says_all_frozen
FROM pg_visibility('orders', true) -- true = also check heap page headers
WHERE all_visible != pd_all_visible
OR all_frozen != pd_all_frozen
ORDER BY blkno
LIMIT 20;
This reveals pages where the VM is stale (the page is actually all-visible but the VM bit hasn’t been set by vacuum yet) or where the VM has been cleared by a recent modification.
Identify tables with poor VM coverage
SELECT
relname,
n_live_tup,
n_dead_tup,
round(100.0 * n_dead_tup / NULLIF(n_live_tup, 0), 1) AS dead_pct,
last_vacuum,
last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC
LIMIT 20;
Tables with high dead tuple counts and stale vacuum timestamps are prime candidates for VM bit recovery problems. Running VACUUM on these tables will set all-visible bits and immediately improve index-only scan performance.
Measure actual heap fetches in index-only scans
EXPLAIN (ANALYZE)
SELECT sum(amount) FROM orders WHERE status = 'completed';
Look for the Heap Fetches line in the output. Divide it by the total rows to get the fetch ratio. Anything above 5-10% suggests the VM is not keeping up with modifications.
Key Takeaways
- The VM stores two bits per heap page: all-visible (no dead tuples, everything visible) and all-frozen (no tuples need freezing). It lives in a
_vmfork file roughly 1/32768th the size of the heap. - The all-visible bit is what makes index-only scans work. When set, the query engine can return data from the index without visiting the heap. When not set, it falls back to a regular index scan.
- Only VACUUM sets VM bits. Any data-modifying operation (INSERT, UPDATE, DELETE) clears them. This means frequent updates prevent VM recovery and degrade index-only scan performance.
- The VM is conservative: a set bit is a guarantee, but an unset bit might or might not be accurate. Many pages are actually all-visible but haven’t had their bits set yet.
- Long-running transactions silently block VM bit setting across the entire database. Monitor for idle-in-transaction sessions.
- HOT updates are the VM’s best friend. Keep non-indexed columns updated separately, use appropriate fillfactor, and let HOT chains keep the damage contained to single pages.
- Use
pg_visibilityto inspect VM coverage. A healthy table should have >95% all-visible pages for index-only scans to be effective.
What’s Next
We’ve now covered the three per-relation fork files: the main heap, the Free Space Map, and the Visibility Map. Tomorrow we move to the fourth: the Initialization Fork, and then into WAL format and redo/undo architecture, the write-ahead logging system that makes crash recovery possible.
Previous: The Free Space Map — How Postgres Knows Where to Put Your Next Row