PostgreSQL Deep Dive

PostgreSQL Deep dive: The free space map — how Postgres knows where to put your next row

Every time PostgreSQL inserts a row, it needs to find a page with enough room to store it. For a table with 10 million pages, scanning every page to find one with 200 bytes free would be catastrophic. Instead, PostgreSQL consults a data structure called the Free Space Map (FSM), a binary tree that provides an O(log N) lookup for available space across every page in a relation.

The FSM is one of those infrastructure components that silently does its job until something goes wrong. Today we’re looking at how it works, what its limitations are, and why it used to be one of the most common sources of operational pain in PostgreSQL.

The Problem: Where Does the Next Row Go?

When you INSERT a row, PostgreSQL needs to find a heap page with enough contiguous free space. The page size is 8KB. After headers, line pointers, and existing tuples, the free space on a page might be anywhere from 0 to ~8000 bytes.

For each table, PostgreSQL maintains three separate “forks” on disk:

ForkSuffixPurpose
Main(none)The actual table data
FSM_fsmFree space tracking
Visibility Map_vmAll-visible and all-frozen tracking

If your table’s main file is 12345, you’ll also have 12345_fsm and 12345_vm in the same directory. Each fork has its own set of pages, managed independently.

The FSM is the structure that answers the question: “Given that I need to insert a tuple of size X, which page has at least X bytes of free space?”

How the FSM Works: A Binary Tree of Free Space Values

The FSM is organized as a tree of pages. Each FSM page contains a binary tree stored in a flat array, with one byte per node.

The Leaf Level

At the bottom level, each leaf node represents one heap page (or one lower-level FSM page). The value stored in the leaf node represents the amount of free space on that heap page, quantized to one of 256 possible values. Since the maximum free space on an 8KB page is roughly 8160 bytes, the quantization maps these to 0-255, where each step represents approximately 32 bytes of free space.

The quantization matters. PostgreSQL cannot distinguish between a page with 100 bytes free and a page with 131 bytes free. Both map to the same FSM value. One byte per page keeps the FSM compact, at the cost of some precision.

The Upper Levels

Internal nodes store the maximum of their two children’s values. This means the root of each FSM page’s tree contains the maximum free space available across all pages represented by that FSM page.

This aggregation propagates upward. The root of the entire FSM tree gives you the maximum free space available on any page in the relation, with a single byte lookup.

Searching the FSM

When PostgreSQL needs to insert a tuple of size N bytes:

  1. Start at the root FSM page
  2. If the root’s maximum value < the quantized threshold for N bytes, there’s no room. Extend the relation with a new page.
  3. Otherwise, traverse down: at each internal node, prefer the left child if it has enough space (preserves insertion order), otherwise go right
  4. At the leaf level, read the actual heap page’s header to get the precise free space (pd_lower and pd_upper in PageHeaderData)
  5. If the actual free space is insufficient (due to quantization rounding), try the next leaf, or extend the relation

The search is O(log N) where N is the number of heap pages, since the FSM tree has depth log₂(pages per FSM page) for the intra-page tree, plus the number of FSM pages in the inter-page tree.

Updating the FSM

The FSM is updated in two situations:

On INSERT: After placing a tuple on a heap page, the free space decreases. PostgreSQL updates the leaf node in the FSM and propagates the change upward through the internal nodes. These updates are WAL-logged to ensure crash recovery.

On VACUUM: After reclaiming space from dead tuples, the free space on the page increases. Vacuum updates the FSM entries for all pages it processed. This is how space freed by DELETE and UPDATE operations becomes available for new INSERTs.

The FSM is not updated on every single tuple change. It’s updated on INSERT (because we just read it to find the page) and on VACUUM (the bulk cleanup pass). This means that if you delete a lot of rows without running vacuum, the FSM will show those pages as full even though they have plenty of free space. The space is only reclaimed after vacuum runs and updates the FSM.

The Page Header: Where Free Space Lives

To understand the FSM, you need to understand how free space is tracked within a single page. Every heap page starts with a 24-byte PageHeaderData:

pd_lsn           8 bytes   WAL LSN for last change
pd_checksum      2 bytes   Page checksum (if enabled)
pd_flags         2 bytes   Flag bits
pd_lower         2 bytes   Offset to start of free space
pd_upper         2 bytes   Offset to end of free space
pd_special       2 bytes   Offset to special space (indexes)
pd_pagesize_version 2 bytes Page size and version
pd_prune_xid     4 bytes   Oldest unpruned XMAX

The free space on a page is the gap between pd_lower and pd_upper:

  • pd_lower points to the end of the line pointer array (ItemIds grow forward from after the header)
  • pd_upper points to the start of the tuple data area (tuples grow backward from the end of the page)
  • Free space = pd_upper - pd_lower

The FSM stores a quantized version of this value. When PostgreSQL actually tries to use a page, it reads pd_lower and pd_upper to get the exact figure.

The Gotcha: Before PostgreSQL 8.4, the FSM Was a Config Nightmare

If you’ve been using PostgreSQL since before 8.4 (released 2009), you might remember max_fsm_pages and max_fsm_relations. These were critical configuration parameters that caused endless headaches.

The old FSM was a single shared data structure, not per-relation. It lived in shared memory and was sized at startup by two parameters:

  • max_fsm_pages: maximum number of pages tracked across all relations (default: 204800)
  • max_fsm_relations: maximum number of relations tracked (default: 1000)

When the FSM filled up, it could no longer track free space for additional relations. The consequences were ugly:

  • INSERTs would skip the FSM entirely and extend the relation with new pages, even when existing pages had plenty of free space
  • Tables grew unnecessarily large
  • VACUUM couldn’t record reclaimed space, so vacuumed space was lost until the next major operation

Tuning these parameters was a guessing game. Too low and your tables bloated. Too high and you wasted shared memory (which mattered more when servers had 4GB RAM). And you had to restart PostgreSQL to change them.

The per-relation FSM, introduced in 8.4, eliminated this entirely. Each relation’s FSM grows and shrinks with the relation itself, stored as a separate fork file alongside the main data. No shared memory. No configuration parameters. No restarts.

If you ever see a tuning guide recommending max_fsm_pages, it’s at least 17 years out of date.

The Second Gotcha: FSM Staliness After Bulk Deletes

The FSM only reflects reality as of the last INSERT or VACUUM on each page. If you DELETE a million rows, the freed space doesn’t appear in the FSM until vacuum runs.

This matters for bulk operations. Consider this scenario:

-- Delete 50% of the rows
DELETE FROM events WHERE created_at < '2025-01-01';

-- Immediately insert new rows
INSERT INTO events (payload, created_at) VALUES (...);

After the DELETE, the heap pages have plenty of free space. But the FSM still shows them as nearly full, so the INSERTs extend the relation with new pages instead of reusing the freed space. Your table temporarily doubles in size.

Only after autovacuum runs (or you run VACUUM manually) will the FSM be updated to reflect the reclaimed space. Future INSERTs will then reuse those pages.

For bulk delete-then-reload patterns, run VACUUM between the DELETE and INSERT:

DELETE FROM events WHERE created_at < '2025-01-01';
VACUUM events;  -- Updates FSM with reclaimed space
INSERT INTO events (payload, created_at) VALUES (...);

Or better yet, if you’re replacing all data, use TRUNCATE + INSERT instead of DELETE + INSERT. TRUNCATE is instantaneous (it doesn’t scan the table) and starts fresh.

The Third Gotcha: FSM Accuracy and Copy Tables

The quantization of free space values means the FSM can slightly overestimate available space. When PostgreSQL reads the FSM, finds a candidate page, then checks the actual page header and discovers there isn’t quite enough room, it moves on to the next candidate.

In normal operation this is invisible. But during COPY or bulk inserts, PostgreSQL uses a “bulk insert” optimization that bypasses the FSM entirely. Instead, it extends the relation by appending new pages and filling them sequentially. This is faster than searching the FSM for existing pages, especially when inserting many rows at once.

This means that after a bulk insert, the FSM may be stale for many pages. The newly extended pages won’t have FSM entries until they’re explicitly updated. This is fine because the FSM will be updated on the next vacuum pass.

The Visibility Map Connection

The FSM doesn’t operate in isolation. It’s tightly coupled with the Visibility Map (VM), which we’ll cover in depth in a future post, but the connection matters here.

The VM tracks two bits per heap page:

  • All-visible: every tuple on the page is visible to all active transactions (no dead tuples)
  • All-frozen: every tuple on the page is frozen (no anti-wraparound vacuum needed)

When a page is all-visible, VACUUM can skip it entirely. But if VACUUM skips a page, it also doesn’t update the FSM for that page. The FSM might be less accurate for pages that have been all-visible for a long time and then had their all-visible bit cleared (e.g., by an UPDATE).

In practice this rarely causes problems, but it’s one reason why pg_freespacemap might show unexpected values on pages that were recently updated after a long period of stability.

Practical SQL: Inspecting the Free Space Map

The pg_freespacemap extension provides access to FSM data. First enable it:

CREATE EXTENSION IF NOT EXISTS pg_freespacemap;

Check free space on every page of a table

SELECT
    blkno,
    avail,
    pg_size_pretty(avail) AS free_space
FROM pg_freespace('events')
ORDER BY blkno
LIMIT 50;

This shows the FSM value for each page. Remember the values are quantized (step size ~32 bytes), so avail is approximate.

Find pages with the most free space

SELECT
    blkno,
    avail AS free_bytes_approx,
    pg_size_pretty(avail) AS free_space
FROM pg_freespace('events')
WHERE avail > 0
ORDER BY avail DESC
LIMIT 20;

Useful for understanding whether your table has fragmented free space (many pages with small amounts free) vs. concentrated free space (few pages with large amounts free).

Calculate FSM efficiency

WITH fsm AS (
    SELECT
        count(*) AS total_pages,
        count(*) FILTER (WHERE avail = 0) AS full_pages,
        count(*) FILTER (WHERE avail > 0 AND avail < 100) AS lightly_free,
        count(*) FILTER (WHERE avail >= 100) AS substantially_free,
        avg(avail)::int AS avg_free_bytes
    FROM pg_freespace('events')
)
SELECT
    total_pages,
    full_pages,
    lightly_free,
    substantially_free,
    round(100.0 * full_pages / total_pages, 1) AS pct_full,
    round(100.0 * substantially_free / total_pages, 1) AS pct_available,
    avg_free_bytes
FROM fsm;

A healthy table that receives regular inserts and vacuum will show moderate fragmentation: some pages full, some with varying amounts free. A table with many lightly-free pages and few substantially-free pages might benefit from VACUUM FULL or pg_repack to consolidate the free space.

Check FSM file size on disk

SELECT
    pg_relation_size('events') AS main_size,
    pg_relation_size('events', 'fsm') AS fsm_size,
    pg_relation_size('events', 'vm') AS vm_size,
    round(100.0 * pg_relation_size('events', 'fsm') /
          pg_relation_size('events'), 2) AS fsm_pct_of_main
FROM (
    SELECT 'events' AS events
) t;

The FSM file is typically 1/512th the size of the main relation file, because one FSM page (8KB) tracks the free space of 512 heap pages (each leaf node uses one byte, and a page holds ~8000 bytes). This ratio holds for most practical table sizes.

Compare FSM with actual page free space

The FSM can be stale. Here’s how to compare what the FSM thinks with reality:

-- Get the FSM's view
WITH fsm_data AS (
    SELECT blkno, avail AS fsm_free
    FROM pg_freespace('events')
    WHERE avail > 0
),
-- Get the actual free space from page headers
-- (requires pgstattuple or pageinspect)
actual_data AS (
    SELECT
        blkno,
        (upper - lower) AS actual_free
    FROM page_header(get_raw_page('events', generate_series(
        0, (SELECT pg_relation_size('events') / 8192 - 1)
    )))
)
SELECT
    f.blkno,
    f.fsm_free AS fsm_reports,
    a.actual_free AS actual_free,
    f.fsm_free - a.actual_free AS discrepancy
FROM fsm_data f
JOIN actual_data a ON f.blkno = a.blkno
WHERE abs(f.fsm_free - a.actual_free) > 100
ORDER BY discrepancy DESC
LIMIT 20;

This requires the pageinspect extension. Large discrepancies indicate pages where the FSM is stale and needs to be updated by VACUUM.

FSM and VACUUM: The Update Cycle

The relationship between the FSM and VACUUM is fundamental. Here’s the cycle:

  1. INSERT: reads FSM to find a page, inserts the tuple, updates the FSM leaf and propagates upward
  2. UPDATE/DELETE: marks tuples as dead but does NOT update the FSM. The free space on the page is still occupied by the dead tuple until vacuum reclaims it.
  3. VACUUM: scans pages, removes dead tuples, compacts the page (moving tuples to close gaps), updates pd_lower and pd_upper, then updates the FSM with the new free space values
  4. HOT prune: during a SELECT or other operation, PostgreSQL may opportunistically prune dead tuples from a page and update the FSM without a full vacuum. This is called heap-only tuple pruning, triggered by pd_prune_xid in the page header.

The pd_prune_xid field stores the oldest XMAX among unpruned tuples on the page. When a transaction with an XID older than pd_prune_xid attempts to read the page, it triggers a pruning pass. This is opportunistic vacuum that happens without any explicit VACUUM command.

Key Takeaways

  • The FSM is a per-relation binary tree stored in a separate fork file (_fsm). Each leaf node represents one heap page and stores a quantized free space value (0-255, step size ~32 bytes).
  • Internal nodes store the maximum of their children. This enables O(log N) lookup for “find me a page with at least X bytes free.”
  • The FSM is only updated on INSERT and VACUUM, not on DELETE or UPDATE. Bulk deletes leave the FSM stale until vacuum runs.
  • Before PostgreSQL 8.4, the FSM was a shared memory structure sized by max_fsm_pages and max_fsm_relations. Running out of FSM space caused silent table bloat. The per-relation design eliminated this entirely.
  • Bulk inserts (COPY) bypass the FSM entirely and extend the relation. The FSM is updated later by vacuum.
  • Use pg_freespacemap to inspect FSM data and diagnose fragmentation. The FSM file is typically ~0.2% of the main relation size.
  • After bulk deletes, run VACUUM before inserting to ensure freed space is available for reuse.

What’s Next

We’re staying in the storage layer. Tomorrow we’ll look at the Visibility Map (VM), the FSM’s partner structure that tracks all-visible and all-frozen pages. The VM is what makes index-only scans possible, and understanding it matters for tuning vacuum behavior and diagnosing why your queries might be doing more work than they need to.


Previous: TOAST — The Oversized-Attribute Storage Technique