PostgreSQL Deep Dive

PostgreSQL Deep dive: Row-level locks

You run SELECT * FROM orders WHERE id = 42 FOR UPDATE and assume PostgreSQL is keeping track of that lock somewhere in memory. It is, sort of. But it is also writing to your heap pages. Every single row you lock gets a physical modification on disk. That SELECT you thought was read-only? It is not.

Row-level locks in PostgreSQL sit at the intersection of MVCC and traditional locking. They are the mechanism that prevents lost updates when two transactions want to modify the same row, and they are more nuanced than most people realise. There are four distinct lock modes, a fast-path optimisation that bypasses the shared lock table entirely, and a surprising amount of disk I/O hidden behind what looks like a simple read query.

The Four Row-Level Lock Modes

PostgreSQL provides four row-level lock modes, each acquired through the FOR clause on a SELECT statement. They form a strict hierarchy of exclusivity.

FOR UPDATE is the strongest. It locks the row exclusively. No other transaction can acquire any row-level lock on the same row, whether shared or exclusive. No UPDATE, no DELETE, no FOR SHARE, nothing. This is the lock mode that DELETE always acquires, and that UPDATE acquires when it modifies columns covered by a unique index (the columns that could be referenced by a foreign key).

FOR NO KEY UPDATE is slightly weaker. It behaves like FOR UPDATE in that it grants an exclusive lock, but it does not block SELECT FOR KEY SHARE. This is the lock that UPDATE acquires when it modifies columns not covered by any unique index. The distinction exists so that foreign key reference checks using FOR KEY SHARE do not get blocked by updates to non-key columns.

FOR SHARE acquires a shared lock on the row. Multiple transactions can hold FOR SHARE simultaneously, but none of them can modify the row. It blocks UPDATE, DELETE, FOR UPDATE, and FOR NO KEY UPDATE.

FOR KEY SHARE is the weakest. It blocks DELETE and UPDATE operations that would change the row’s key values, but allows everything else. This is the lock mode used internally when PostgreSQL checks foreign key references. It prevents a referenced row from being deleted or having its primary key changed, but does not block updates to non-key columns on that row.

Here is the conflict matrix:

Requested \ CurrentFOR KEY SHAREFOR SHAREFOR NO KEY UPDATEFOR UPDATE
FOR KEY SHAREX
FOR SHAREXX
FOR NO KEY UPDATEXXX
FOR UPDATEXXXX

A cell marked X means the requested lock conflicts with the currently held lock and must wait.

How Row Locks Are Actually Stored

PostgreSQL does not maintain a separate in-memory structure listing every locked row. Instead, row locks are embedded directly in the tuple header.

When you run SELECT FOR UPDATE, PostgreSQL sets two fields in the tuple’s HeapTupleHeaderData:

  • t_xmax is set to the locking transaction’s XID (the same field used to mark a row as deleted by an updating transaction).
  • t_infomask gets the HEAP_XMAX_LOCK_ONLY bit set. This tells PostgreSQL that the xmax value represents a lock holder, not a deletion. Additional bits in t_infomask2 encode which specific lock mode was taken.

This means locking a row is a physical write to the heap page. The page gets marked as dirty, WAL gets generated, and eventually the checkpoint or background writer flushes it to disk. A SELECT FOR UPDATE on a million rows will dirty a million heap pages and generate a corresponding amount of WAL. It is not a free operation.

If PostgreSQL kept row locks in a shared memory table, that table would need to be as large as the total number of locked rows across all transactions, which could be enormous. By storing locks in the tuple header, the locking mechanism piggybacks on the existing buffer management and WAL infrastructure. No separate cleanup is needed. When the transaction commits or rolls back, the tuple’s xmax is no longer active, and the lock is effectively released. Vacuum eventually clears it out.

There is no upper limit on how many rows a single transaction can lock. The documentation is explicit about this: “PostgreSQL doesn’t remember any information about modified rows in memory, so there is no limit on the number of rows locked at one time.” But the disk I/O cost is real.

Implicit Locks: UPDATE and DELETE

You do not need to write FOR UPDATE to acquire row-level locks. Data-modifying statements take them automatically.

When UPDATE modifies a row, it takes either FOR UPDATE or FOR NO KEY UPDATE depending on which columns change. If the update touches columns that are part of a unique index (and could therefore be the target of a foreign key reference), PostgreSQL takes the stronger FOR UPDATE lock. Otherwise it takes FOR NO KEY UPDATE. The distinction matters for concurrency: a FOR NO KEY UPDATE lock still allows SELECT FOR KEY SHARE, so foreign key lookups against non-key columns can proceed without blocking.

When DELETE removes a row, it always acquires a FOR UPDATE lock. This is stronger than necessary for some scenarios, but it is a safe default. A deleted row cannot be referenced by anyone, so the strongest lock is appropriate.

The Fast-Path Locking Optimisation

The shared lock table (LOCK table in shared memory) is a finite resource sized by max_locks_per_transaction * max_connections. If every FOR UPDATE created an entry in this table, a query locking millions of rows would exhaust it quickly.

PostgreSQL avoids this with fast-path locking. For row-level locks, the lock manager first checks whether the lock can be satisfied without entering the main lock table. If the row is not currently locked by another transaction, PostgreSQL stores the lock information entirely in the tuple header (the xmax/infomask approach described above) and skips the shared lock table entirely. No lock table entry is created.

The lock only enters the shared lock table when there is contention, meaning another transaction is already waiting for or holding a lock on that row. At that point, PostgreSQL creates a “heavyweight” lock entry in the shared lock table to manage the wait queue.

In practice, most row-level locks are fast-path. Contention is the exception, not the rule, and this optimisation keeps the lock table manageable even under heavy load.

FOR UPDATE Across Isolation Levels

How FOR UPDATE behaves depends heavily on your isolation level.

In READ COMMITTED (the default), FOR UPDATE re-evaluates the row after acquiring the lock. If the row was modified by another transaction that committed between the time the SELECT started and the lock was acquired, PostgreSQL re-checks the row against the query’s WHERE clause. If the row no longer matches, it is silently dropped from the result set. This means FOR UPDATE in READ COMMITTED can return fewer rows than a plain SELECT on the same data, because concurrent commits can cause rows to be re-checked and excluded.

In REPEATABLE READ, the behaviour is stricter. If a row that your transaction selected with FOR UPDATE has been modified by another committed transaction, PostgreSQL throws an error:

ERROR: could not serialize access due to concurrent update

This is not a deadlock. It is a serialization failure. Your application needs to catch this error and retry the transaction from the beginning. The rationale is that in REPEATABLE READ, your transaction operates on a snapshot from the start of the transaction. If the row has changed, continuing would violate the repeatable-read guarantee.

In SERIALIZABLE, the same error can occur, but it may also be detected earlier through the SSI (Serializable Snapshot Isolation) mechanism, before you even attempt to lock the row.

NOWAIT and SKIP LOCKED

Two modifiers change how FOR UPDATE handles contention.

NOWAIT makes the lock attempt fail immediately if the row is already locked by another transaction:

SELECT * FROM orders WHERE status = 'pending' FOR UPDATE NOWAIT;

If even one matching row is locked, the entire statement fails with:

ERROR: could not obtain lock on row in relation "orders"

This is useful for interactive applications where you want to give the user immediate feedback rather than making them wait.

SKIP LOCKED takes the opposite approach. If a row is locked, PostgreSQL simply skips it and moves on:

SELECT * FROM orders WHERE status = 'pending' FOR UPDATE SKIP LOCKED;

This returns only the rows that were successfully locked. Locked rows are silently excluded from the result set. This is the foundation of many job queue implementations in PostgreSQL, where multiple worker processes pull tasks from a table without blocking each other:

-- Worker grabs next available job
UPDATE jobs
SET worker_id = current_pid, started_at = now()
WHERE id = (
    SELECT id FROM jobs
    WHERE status = 'pending'
    ORDER BY created_at
    FOR UPDATE SKIP LOCKED
    LIMIT 1
)
RETURNING *;

Each worker gets a different row. No waiting, no deadlocks. This pattern is so common that it has become the standard advice for building job queues on PostgreSQL.

Deadlocks with Row-Level Locks

Row-level locks can deadlock even without explicit FOR UPDATE clauses. Consider two transactions:

Transaction A                      Transaction B
-----------                        -----------
UPDATE accounts SET balance        UPDATE accounts SET balance
  = balance + 100                    = balance + 100
  WHERE acctnum = 1;                  WHERE acctnum = 2;

UPDATE accounts SET balance        UPDATE accounts SET balance
  = balance - 100                    = balance - 100
  WHERE acctnum = 2;                  WHERE acctnum = 1;
  -- BLOCKED, waiting for B          -- BLOCKED, waiting for A

Transaction A holds a lock on row 1 and wants row 2. Transaction B holds a lock on row 2 and wants row 1. Deadlock.

PostgreSQL detects this through periodic deadlock checks (controlled by deadlock_timeout, default 1s). When detected, it aborts one of the transactions with:

ERROR: deadlock detected
DETAIL: Process 12345 waits for ShareLock on transaction 9876; blocked by process 67890.
Process 67890 waits for ShareLock on transaction 12345; blocked by process 12345.
HINT: See server log for query details.

The solution is deterministic ordering: always access rows in the same order across transactions (e.g., always by ascending primary key). If both transactions above updated account 1 first, then account 2, no deadlock would occur.

Diagnostic Queries

You can inspect currently held row locks through pg_locks:

-- All row-level locks currently held
SELECT locktype, relation::regclass, mode, pid, tuple
FROM pg_locks
WHERE locktype = 'tuple'
ORDER BY relation::regclass, tuple;

The tuple column shows the physical position of the locked row (block number and offset within the block).

To find sessions waiting for row locks:

SELECT blocked.pid AS blocked_pid,
       blocked.query AS blocked_query,
       blocking.pid AS blocking_pid,
       blocking.query AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_locks bl ON bl.pid = blocked.pid AND bl.granted = false
JOIN pg_locks wl ON wl.locktype = bl.locktype
    AND wl.database IS NOT DISTINCT FROM bl.database
    AND wl.relation IS NOT DISTINCT FROM bl.relation
    AND wl.page IS NOT DISTINCT FROM bl.page
    AND wl.tuple IS NOT DISTINCT FROM bl.tuple
    AND wl.virtualxid IS NOT DISTINCT FROM bl.virtualxid
    AND wl.transactionid IS NOT DISTINCT FROM bl.transactionid
    AND wl.classid IS NOT DISTINCT FROM bl.classid
    AND wl.objid IS NOT DISTINCT FROM bl.objid
    AND wl.objsubid IS NOT DISTINCT FROM bl.objsubid
    AND wl.pid != bl.pid
JOIN pg_stat_activity blocking ON blocking.pid = wl.pid
WHERE bl.granted = false;

To see how long sessions have been waiting, enable lock wait logging:

-- Log when any session waits longer than 500ms for a lock
SET log_lock_waits = on;
SET deadlock_timeout = '500ms';

The Gotcha: SELECT FOR UPDATE Is Not a Read-Only Operation

The most common mistake is treating FOR UPDATE as a cheap operation. It is not. Every locked row gets written to, which means:

  1. Dirty pages in shared_buffers: Locked rows occupy buffer pool space until flushed.
  2. WAL generation: Each lock modification generates WAL records, increasing I/O and replication lag.
  3. Checkpoint amplification: More dirty pages mean slower checkpoints.

If you are locking large result sets, consider whether you really need all those locks. NOWAIT and SKIP LOCKED can help with contention, but they do not reduce the I/O cost of the locks themselves.

Another subtle issue: SELECT FOR UPDATE requires UPDATE privilege on the target table, not just SELECT. This catches people who set up read-only users and then wonder why their explicit locking queries fail.

Key Takeaways

  • Row-level locks are stored in the tuple header (xmax + infomask), not in a separate lock table. Locking a row is a physical write.
  • Four lock modes with a clear hierarchy: FOR KEY SHARE < FOR SHARE < FOR NO KEY UPDATE < FOR UPDATE.
  • UPDATE and DELETE acquire row locks automatically. The specific mode depends on whether key columns are modified.
  • Fast-path locking avoids the shared lock table for uncontended rows, keeping the lock table small.
  • FOR UPDATE behaves differently across isolation levels. In REPEATABLE READ, it throws serialization errors instead of re-evaluating rows.
  • SKIP LOCKED is the foundation of job queue patterns in PostgreSQL.
  • Deadlocks can occur with plain UPDATEs, not just explicit FOR UPDATE. Consistent row access ordering prevents them.
  • Monitor lock waits with pg_locks, log_lock_waits, and deadlock_timeout.

What’s Next

We have spent the past few days deep inside PostgreSQL’s storage format: TOAST, the Free Space Map, the Visibility Map, WAL format, and now the locking mechanism. Tomorrow we will pivot to a different kind of storage problem: how PostgreSQL manages the Transaction ID (XID) lifecycle and what happens when the counter wraps around. It is a commonly misunderstood failure mode in PostgreSQL, and it can bring your database to a complete halt if left unchecked.


Previous in the series: WAL Format and the Redo-Only Architecture