PostgreSQL Deep Dive

PostgreSQL Deep dive: Infomask and hint bits

Every row in PostgreSQL carries a 23-byte header. Buried inside that header are two 16-bit fields, t_infomask and t_infomask2, that together encode whether the row is visible, whether it is locked, whether it has been HOT-updated, how many columns it has, and whether any of its data lives in TOAST storage. These 32 bits are one of the most performance-critical data structures in the entire database. Get them wrong, and every query pays the cost.

The reason these bits exist is simple: speed. PostgreSQL uses MVCC, meaning every row carries the transaction ID of the transaction that created it (t_xmin) and the transaction ID of the transaction that deleted or locked it (t_xmax). To determine whether a row is visible to the current transaction, PostgreSQL would normally need to look up the status of those transaction IDs in the commit log (CLOG), which is on disk. Doing that for every row in every query would be prohibitively expensive.

Instead, PostgreSQL caches the result of that lookup directly in the tuple header. Once a transaction commits or aborts, the next backend that examines the tuple sets “hint bits” recording the outcome. Future queries can skip the CLOG lookup entirely and just check the bits.

The Tuple Header Layout

Before looking at the bits themselves, here is the full HeapTupleHeaderData structure as defined in src/include/access/htup_details.h:

struct HeapTupleHeaderData
{
    union {
        HeapTupleFields t_heap;     /* xmin, xmax, cid/xvac */
        DatumTupleFields t_datum;
    } t_choice;

    ItemPointerData t_ctid;         /* current tuple ID (block, offset) */

    uint16 t_infomask2;             /* number of attributes + flags */
    uint16 t_infomask;              /* various flag bits */
    uint8  t_hoff;                  /* header size including bitmap + padding */
    uint8  t_bits[FLEXIBLE_ARRAY_MEMBER]; /* NULL bitmap */
    /* USER DATA FOLLOWS */
};

The header is exactly 23 bytes before the NULL bitmap and alignment padding kick in. t_hoff tells you the total header size including the bitmap and any padding needed to align the user data to an 8-byte boundary.

t_infomask: The Main Flag Word

t_infomask is a 16-bit field with the following bit assignments. I have grouped them by function.

Storage Flags (bits 0-3)

BitNameValueMeaning
0HEAP_HASNULL0x0001Row has at least one NULL attribute. Triggers the NULL bitmap in t_bits.
1HEAP_HASVARWIDTH0x0002Row has at least one variable-width attribute (text, varchar, bytea, etc.).
2HEAP_HASEXTERNAL0x0004At least one attribute is stored externally in the TOAST table (compressed or out-of-line).
3HEAP_HASOID_OLD0x0008Row includes the legacy system column OID. Unused in modern PostgreSQL (removed from user tables in PG 12).

These four bits are set at insert time and rarely change. They tell the executor what kind of data it will find when it starts reading columns, which affects how quickly it can unpack the row.

Lock Flags (bits 4-7)

BitNameValueMeaning
4HEAP_XMAX_KEYSHR_LOCK0x0010t_xmax holds a FOR KEY SHARE locker
5HEAP_COMBOCID0x0020t_cid field is a combo command ID (used when multiple CIDs need packing)
6HEAP_XMAX_EXCL_LOCK0x0040t_xmax holds an exclusive locker (FOR UPDATE or FOR NO KEY UPDATE)
7HEAP_XMAX_LOCK_ONLY0x0080t_xmax represents a lock, not a deletion

These bits encode row-level lock information. The key combination rules:

  • HEAP_XMAX_KEYSHR_LOCK alone = FOR KEY SHARE
  • HEAP_XMAX_SHR_LOCK (bits 4+6) = FOR SHARE (both key-shared and exclusive bits set)
  • HEAP_XMAX_EXCL_LOCK alone = FOR NO KEY UPDATE
  • HEAP_XMAX_EXCL_LOCK + HEAP_XMAX_LOCK_ONLY = FOR UPDATE
  • None of the lock bits set, but t_xmax is valid = the row was deleted or updated by t_xmax

The HEAP_LOCK_MASK macro combines bits 4, 6, and the combination. The HEAP_XMAX_IS_LOCKED_ONLY() function checks whether a tuple is merely locked (not deleted) by looking at these bits.

Visibility Hint Bits (bits 8-13)

BitNameValueMeaning
8HEAP_XMIN_COMMITTED0x0100The transaction in t_xmin has committed. The row is visible.
9HEAP_XMIN_INVALID0x0200The transaction in t_xmin aborted or rolled back. The row is dead.
8+9HEAP_XMIN_FROZEN0x0300Both bits set. The row’s xmin is older than the freeze horizon. No transaction lookup needed.
10HEAP_XMAX_COMMITTED0x0400The transaction in t_xmax has committed. If xmax is a deleter, the row is dead.
11HEAP_XMAX_INVALID0x0800The transaction in t_xmax aborted or the xmax is not valid. The row was not actually deleted.
12HEAP_XMAX_IS_MULTI0x1000t_xmax is a MultiXactId, not a regular transaction ID. Multiple lockers or a locker + updater share the xmax slot.

These are the performance-critical bits. Here is how they work in practice.

When a row is first inserted, none of the hint bits are set. The first query that encounters the row must look up t_xmin in the CLOG to determine whether the inserting transaction committed. Once the answer is known, the backend sets either HEAP_XMIN_COMMITTED or HEAP_XMIN_INVALID on the tuple. Every subsequent query that reads this row can skip the CLOG lookup and just check the bit.

The same applies to t_xmax. When a row is deleted or updated, the deleting transaction’s ID is written to t_xmax. Future queries need to know whether that deletion committed. Once the CLOG confirms it, HEAP_XMAX_COMMITTED gets set. If the deleting transaction aborted, HEAP_XMAX_INVALID gets set instead, and the row is still alive.

HEAP_XMIN_FROZEN is special. It is not really a “committed” or “invalid” state. It means the xmin is so old that it predates the freeze horizon and will never be checked against the CLOG again. This saves space in the CLOG and is the end result of the anti-wraparound vacuum process we covered earlier.

Status Bits (bits 13-15)

BitNameValueMeaning
13HEAP_UPDATED0x2000This tuple has been updated. The new version is elsewhere.
14HEAP_MOVED_OFF0x4000Row was moved by pre-9.0 VACUUM FULL (kept for binary upgrade).
15HEAP_MOVED_IN0x8000Row was moved in by pre-9.0 VACUUM FULL (kept for binary upgrade).

HEAP_UPDATED is set when a new version of the tuple has been created. It is mainly used during vacuum and HOT chain traversal to identify the update chain.

HEAP_MOVED_OFF and HEAP_MOVED_IN are historical artifacts from before PostgreSQL 9.0, when VACUUM FULL physically moved rows. They still exist in the source code for binary upgrade compatibility but should never appear in a modern database.

t_infomask2: Attributes and HOT Flags

t_infomask2 is a second 16-bit field with a dual purpose.

BitsNameValueMeaning
0-10HEAP_NATTS_MASK0x07FFNumber of attributes (columns) in this tuple. 11 bits, max 2047.
13HEAP_KEYS_UPDATED0x2000The UPDATE that created this tuple modified key columns (columns referenced by foreign keys or unique indexes).
14HEAP_HOT_UPDATED0x4000This tuple was created by a HOT (Heap Only Tuple) update. The new version is on the same page.
15HEAP_ONLY_TUPLE0x8000This tuple is heap-only. It was created by HOT and has no index entries pointing to it.
15HEAP_TUPLE_HAS_MATCH0x8000Reuse of bit 15 during hash joins. A temporary flag indicating the tuple has been matched.

The attribute count in the low 11 bits tells the executor how many columns to expect. This is essential because the physical storage does not include per-column length headers for fixed-width types. The executor uses the table schema (pg_attribute) combined with this count to know where each column starts.

HEAP_HOT_UPDATED and HEAP_ONLY_TUPLE are the HOT mechanism bits. When an UPDATE modifies only non-indexed columns and the new row fits on the same page, PostgreSQL creates a heap-only tuple. The old tuple gets HEAP_HOT_UPDATED set, and the new tuple gets HEAP_ONLY_TUPLE set. This allows index scans to follow the HOT chain on the heap page without needing to fetch the index again. We covered this in detail in the Visibility Map post.

HEAP_KEYS_UPDATED determines what kind of row-level lock a subsequent UPDATE takes. If key columns were changed, the updater gets a FOR UPDATE lock (the strongest). Otherwise it gets FOR NO KEY UPDATE. This distinction exists so that SELECT FOR KEY SHARE (used by foreign key reference checks) does not get blocked by updates to non-key columns.

How Hint Bits Are Set: The SetHintBits Function

Setting hint bits is not as simple as flipping a bit. Because the tuple lives on a shared buffer page, the backend needs a buffer lock, and because the change is a WAL-logged modification, there are ordering constraints.

The SetHintBits function in heapam_visibility.c handles this. The key rules:

  1. Aborted transactions can always be marked. Setting HEAP_XMIN_INVALID or HEAP_XMAX_INVALID is always safe because it only makes the tuple less visible. No harm if the page crashes before the hint is written.

  2. Committed transactions require careful ordering. Setting HEAP_XMIN_COMMITTED is only safe if the commit WAL record is guaranteed to be flushed to disk before the buffer page is flushed. Otherwise, a crash could leave a page claiming a tuple is committed while the commit record is lost. The function checks the page LSN against the transaction’s commit LSN. If the page LSN is already newer, the commit is safely on disk and the hint can be set. If not, the hint bit is deferred to a future examination.

  3. Only one backend sets hints on a page at a time. The BufferBeginSetHintBits mechanism ensures exclusive access to hint bit setting on a given page, preventing concurrent backends from corrupting the page.

  4. Hint bit setting is lazy and opportunistic. PostgreSQL does not go out of its way to set hint bits. They get set whenever a query happens to examine a tuple and notices the hint is missing. This means that after a large batch of inserts or updates, many tuples may not have hint bits set yet. The first queries to touch those rows will pay the cost of CLOG lookups and set the hints for future queries.

This lazy behaviour is why you sometimes see a spike in CLOG access after a bulk load or large transaction. The first queries after the load set all the hint bits, and subsequent queries are faster.

The Gotcha: Hint Bits and Replica Consistency

Hint bits are set on the primary and propagated to replicas via WAL. But there is a subtlety. On a streaming replica, hint bits arrive as part of the WAL stream and are replayed. However, the replica’s CLOG state might lag behind the primary. In rare cases, a replica might see a tuple with HEAP_XMIN_COMMITTED set but not have the corresponding CLOG entry yet.

This is generally not a problem because the hint bits are authoritative. Once set, they are the truth. The CLOG is only consulted when the hint bits are absent. But hint bits are a form of caching, and like all caches, they can theoretically be stale if something goes very wrong with the replication pipeline.

A more practical gotcha: on a freshly promoted replica, some tuples might not have hint bits set because they were only read on the old primary (which set the hints) but the WAL for those hint bit changes has not been replayed yet. In practice, PostgreSQL replays WAL quickly enough that this rarely matters, but it explains why the first query on a freshly promoted replica can be slower than expected.

Diagnostic Queries

PostgreSQL provides the pageinspect extension for inspecting raw tuple headers. Install it with:

CREATE EXTENSION IF NOT EXISTS pageinspect;

Then inspect the infomask flags on any page:

SELECT t_ctid,
       t_xmin, t_xmax,
       t_infomask,
       t_infomask2,
       raw_flags,
       combined_flags
FROM heap_page_items(get_raw_page('your_table', 0)),
     LATERAL heap_tuple_infomask_flags(t_infomask, t_infomask2)
WHERE t_infomask IS NOT NULL
LIMIT 20;

This returns human-readable flag names like HEAP_XMIN_COMMITTED, HEAP_HASNULL, HEAP_HASVARWIDTH, and so on.

To count tuples without hint bits set (a measure of “cold” data that has not been examined since insert):

SELECT count(*) AS missing_xmin_hint
FROM heap_page_items(get_raw_page('your_table', 0)),
     LATERAL heap_tuple_infomask_flags(t_infomask, t_infomask2)
WHERE NOT ('HEAP_XMIN_COMMITTED' = ANY(raw_flags)
      OR 'HEAP_XMIN_INVALID' = ANY(raw_flags)
      OR 'HEAP_XMIN_FROZEN' = ANY(combined_flags));

Run this across all pages in a table to find how many rows are still paying the CLOG tax. After a bulk load, you will see most rows missing hints. After the first SELECT * or VACUUM, they should all be set.

To see the physical layout of a specific row including its header size:

SELECT t_ctid, t_hoff AS header_bytes,
       (t_hoff = 24) AS has_null_bitmap
FROM heap_page_items(get_raw_page('your_table', 0))
WHERE t_ctid = '(0,1)';

A t_hoff of 24 means the header includes a NULL bitmap (23 bytes + 1 byte for the bitmap). A t_hoff of 24 could also mean 24 bytes due to alignment padding. The header size is always rounded up to MAXALIGN (8 bytes on 64-bit systems).

Practical Impact

Understanding infomask bits matters for several real-world scenarios:

Post-insert performance. After a bulk COPY or multi-row INSERT, the first SELECT on the table will be slower than subsequent ones because it must look up CLOG entries and set hint bits. If you need predictable query performance right after a load, run a warm-up query or VACUUM (which touches every row and sets all hints).

Monitoring hint bit health. Tuples without xmin hint bits are a sign that the data has not been examined since insertion. This is normal for append-only workloads but can indicate under-vacuumed tables in update-heavy workloads.

HOT chain debugging. The combination of HEAP_HOT_UPDATED on the old tuple and HEAP_ONLY_TUPLE on the new tuple tells you whether HOT is working for your update pattern. If you see updates without these bits, the updates are creating new index entries (non-HOT), which means more index bloat over time.

Lock diagnostics. The lock bits in t_infomask let you determine exactly what kind of lock is held on a row without consulting pg_locks. If you are debugging a blocking lock, looking at the raw tuple header tells you whether the blocker is a FOR UPDATE, FOR SHARE, or FOR KEY SHARE.

Key Takeaways

  • Every tuple carries 32 bits of metadata in t_infomask and t_infomask2 that control visibility, locking, storage, and HOT chain behaviour.
  • Hint bits (HEAP_XMIN_COMMITTED, HEAP_XMIN_INVALID, HEAP_XMAX_COMMITTED, HEAP_XMAX_INVALID) cache CLOOK results to avoid disk lookups on every row read.
  • Hint bits are set lazily by whichever backend first examines a tuple after the relevant transaction completes. Committed hints require WAL ordering guarantees.
  • The lock bits (HEAP_XMAX_KEYSHR_LOCK, HEAP_XMAX_EXCL_LOCK, HEAP_XMAX_LOCK_ONLY) encode row-level lock mode directly in the tuple header, which is why SELECT FOR UPDATE causes physical writes.
  • HEAP_XMIN_FROZEN (both committed + invalid bits set) marks rows old enough to skip all transaction ID checks. This is the product of anti-wraparound vacuuming.
  • Use the pageinspect extension with heap_page_items and heap_tuple_infomask_flags to inspect raw infomask bits on any table.
  • After bulk loads, the first queries pay a CLOG tax while setting hint bits. Subsequent queries benefit from the cached hints.

What’s Next

We have now dissected the tuple header down to individual bits. Infomask is the bridge between MVCC theory and physical storage, and understanding it makes every other concurrency topic easier to reason about. Tomorrow we will look at something that builds directly on these bits: SERIALIZABLE isolation and predicate locking (SSI). PostgreSQL’s implementation of true serializability using SSI is one of the most sophisticated concurrency control mechanisms in any open-source database, and it relies on many of the same infomask patterns we covered today.


Previous in the series: Row-Level Locks: What SELECT FOR UPDATE/SHARE Actually Does