PostgreSQL Deep Dive

PostgreSQL Deep Dive: Prepared Transactions, max_prepared_transactions, and the Two-Phase Trap

You deploy a microservices architecture with a transaction coordinator. The coordinator uses PostgreSQL’s two-phase commit to atomically commit across two databases. One day the coordinator crashes mid-transaction — it sent PREPARE TRANSACTION to both databases but never sent COMMIT PREPARED. The prepared transaction sits there. For hours. For days. Vacuum is blocked. Table bloat accumulates. Nobody notices until the disk fills up.

This is the two-phase commit trap: a feature that exists for a narrow use case (distributed transaction coordination), quietly enabled by a single configuration parameter, that creates the same class of problems as idle-in-transaction connections — but worse, because prepared transactions survive server restarts and aren’t caught by any timeout.

Let’s break down how prepared transactions work, why max_prepared_transactions is the most dangerous default-zero setting in PostgreSQL, and what to do if you actually need two-phase commit.

What Two-Phase Commit Does

Two-phase commit (2PC) is a protocol for atomically committing a transaction across multiple resource managers — say, two PostgreSQL databases, or a PostgreSQL database and a message queue. The protocol has two phases:

Phase 1 — Prepare. The coordinator asks every participant: “can you commit?” Each participant writes enough state to durable storage to guarantee it can either commit or rollback, regardless of what happens next. In PostgreSQL, this is PREPARE TRANSACTION 'tx_123'.

Phase 2 — Commit or Rollback. The coordinator collects all responses. If every participant said “yes,” it sends COMMIT PREPARED 'tx_123' to each. If any said “no” (or timed out), it sends ROLLBACK PREPARED 'tx_123'.

The key property: once a participant has prepared, it must be able to commit or rollback — even if the server crashes and restarts. PostgreSQL guarantees this by writing the prepared transaction’s state to the pg_twophase/ directory on disk.

The Lifecycle of a Prepared Transaction

BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
-- This transaction also affects another database
PREPARE TRANSACTION 'transfer_20260511_001';

-- The transaction is now dissociated from this session.
-- You can close the session entirely. The transaction persists.

-- Later, from any session, on this database:
COMMIT PREPARED 'transfer_20260511_001';

-- Or, if the coordinator decides to abort:
ROLLBACK PREPARED 'transfer_20260511_001';

After PREPARE TRANSACTION, the original session is free — it has no active transaction. The prepared transaction exists independently, tracked in shared memory and (if it survives a checkpoint) on disk in pg_twophase/. Any session can commit or roll it back using the global identifier.

The documentation is explicit about the intended audience:

“PREPARE TRANSACTION is not intended for use in applications or interactive sessions. Its purpose is to allow an external transaction manager to perform atomic global transactions across multiple databases or other transactional resources. Unless you’re writing a transaction manager, you probably shouldn’t be using PREPARE TRANSACTION.”

max_prepared_transactions: The Gatekeeper

The max_prepared_transactions parameter controls whether 2PC is available at all. Its default is zero, which means PREPARE TRANSACTION will error:

ERROR:  prepared transactions are disabled
HINT:  Set max_prepared_transactions to a nonzero value.

This is one of the few PostgreSQL parameters that:

  • Requires a server restart to change (not just pg_reload_conf())
  • Must be set on both primary and standby — if the standby has a lower value, queries will be rejected
  • Should be at least as large as max_connections if you use 2PC heavily, since every session could theoretically prepare a transaction

The zero default is intentional and protective. Most applications never need 2PC, and leaving it disabled prevents accidental prepared transactions from any source — including misconfigured middleware that uses XA transactions without you realizing.

Why Prepared Transactions Are Dangerous

Prepared transactions combine the worst properties of idle-in-transaction connections with crash-surviving persistence:

1. They hold locks across restarts. A prepared transaction holds every lock it acquired during its original execution. If it locked rows in a critical table, those locks persist through server restarts. No idle_in_transaction_session_timeout will save you — the documentation explicitly states that prepared transactions are not subject to this timeout.

2. They block vacuum. Like any open transaction, a prepared transaction has an xmin horizon that prevents vacuum from reclaiming dead tuples. But unlike a regular transaction, it can sit there for days or weeks after a coordinator crash, silently bloating every table in the database.

3. They survive crashes. This is by design — the whole point is durability across failures. But it means that restarting PostgreSQL doesn’t fix the problem. The prepared transaction reappears from pg_twophase/ during crash recovery and continues holding its resources.

4. They don’t expire. PostgreSQL has no built-in mechanism to automatically expire or rollback old prepared transactions. Once prepared, a transaction stays prepared until explicitly committed or rolled back. There is no TTL, no background cleanup, no safety net.

5. They’re invisible to most monitoring. They don’t show up as idle in transaction in pg_stat_activity — the original session is gone. They don’t trigger slow query logs. The only way to find them is to explicitly query pg_prepared_xacts.

The Real-World Failure Mode

Here’s how this typically goes wrong:

  1. An application server uses a Java EE app server with XA transactions (two databases + a JMS queue)
  2. The transaction coordinator sends PREPARE TRANSACTION to both PostgreSQL instances
  3. The coordinator crashes or loses network connectivity before sending COMMIT PREPARED
  4. Both PostgreSQL instances now have a prepared transaction holding locks and blocking vacuum
  5. Nobody notices for days because the application “works” — new transactions use new sessions
  6. Table bloat accumulates. Autovacuum runs but can’t reclaim dead tuples. Disk usage climbs.
  7. Someone runs pg_prepared_xacts during an incident and finds transactions from two weeks ago

The pg_twophase/ directory grows as prepared transactions accumulate. Each prepared transaction consumes a slot in the shared memory tracker (limited by max_prepared_transactions). If you hit the limit, new PREPARE TRANSACTION commands fail:

ERROR:  maximum number of prepared transactions reached
HINT:  Increase max_prepared_transactions (currently 10).

Diagnostic Queries

Find all prepared transactions:

SELECT
    gid,
    transaction,
    prepared,
    now() - prepared AS age,
    owner,
    database
FROM pg_prepared_xacts
ORDER BY prepared;

Find prepared transactions older than one hour:

SELECT
    gid,
    transaction,
    prepared,
    now() - prepared AS age,
    owner,
    database
FROM pg_prepared_xacts
WHERE prepared < now() - interval '1 hour'
ORDER BY prepared;

Check for locks held by prepared transactions:

SELECT
    px.gid,
    px.database,
    l.locktype,
    l.mode,
    l.granted,
    CASE
        WHEN l.relation IS NOT NULL THEN
            c.relname
        ELSE NULL
    END AS relation_name
FROM pg_prepared_xacts px
JOIN pg_locks l ON l.pid IS NULL
    AND l.transactionid = px.transaction::text::xid::text
LEFT JOIN pg_class c ON c.oid = l.relation
WHERE l.virtualxid IS NULL;

Check the pg_twophase/ directory size on disk:

-- From the OS:
-- ls -la $PGDATA/pg_twophase/
-- du -sh $PGDATA/pg_twophase/

Monitor prepared transaction count over time:

SELECT
    count(*) AS prepared_count,
    min(prepared) AS oldest,
    max(prepared) AS newest,
    now() - min(prepared) AS max_age
FROM pg_prepared_xacts;

Cleaning Up Orphaned Prepared Transactions

If you find orphaned prepared transactions, the fix is straightforward:

-- Check what the transaction is holding
SELECT * FROM pg_prepared_xacts WHERE gid = 'tx_20260511_001';

-- If you're sure it should be rolled back:
ROLLBACK PREPARED 'tx_20260511_001';

-- If you're sure it should be committed:
COMMIT PREPARED 'tx_20260511_001';

The hard part isn’t the cleanup — it’s knowing which ones are orphans versus which ones are legitimately waiting for a coordinator that’s temporarily down. This requires coordination with your application team and transaction coordinator logs.

Important: Never roll back a prepared transaction without confirming the coordinator’s intent. If the coordinator is still running and plans to commit, rolling back will create data inconsistency across your distributed systems.

If You Must Use 2PC

Some architectures genuinely need distributed transactions. If you’re one of them:

Set max_prepared_transactions carefully. Match it to your expected concurrent prepared transaction count, not to max_connections blindly. A high value wastes shared memory for slots you’ll never use.

# In postgresql.conf — requires restart
max_prepared_transactions = 50

Monitor pg_prepared_xacts proactively. Set up an alert:

-- Alert if any prepared transaction is older than 5 minutes
SELECT count(*) FROM pg_prepared_xacts
WHERE prepared < now() - interval '5 minutes';

Configure your coordinator with timeouts. Most transaction coordinators (Atomikos, Narayana, Bitronix) have a timeout for the window between PREPARE and COMMIT/ROLLBACK. Set it aggressively — 30 seconds to a few minutes, never hours.

Log prepared transaction events. Use log_statement = 'all' on databases involved in 2PC, or at minimum audit PREPARE TRANSACTION, COMMIT PREPARED, and ROLLBACK PREPARED via a custom logging trigger.

Consider alternatives. The saga pattern, eventual consistency with idempotent operations, or outbox pattern with Debezium can often achieve the same business outcome without the operational risk of 2PC. PostgreSQL logical replication with conflict resolution is another option for multi-database consistency.

The Interaction With Recovery

When PostgreSQL starts up (after a crash or planned restart), recovery replays WAL forward from the last checkpoint. During this process:

  1. Any prepared transactions found in WAL are reconstructed in shared memory
  2. Any prepared transactions with state files in pg_twophase/ are loaded back
  3. The transactions resume their prepared state — locks and all

This means a prepared transaction from three months ago that was never cleaned up will reappear after every restart. I’ve seen production incidents where an ancient prepared transaction from a long-dead coordinator held an ACCESS EXCLUSIVE lock on a critical table, blocking all reads immediately after a failover.

The pg_twophase/ directory is the on-disk backing store. Short-lived prepared transactions (those prepared and committed between checkpoints) never hit disk — they live entirely in WAL and shared memory. But once a checkpoint happens while a transaction is in the prepared state, PostgreSQL writes a state file to pg_twophase/ to guarantee durability.

Key Takeaways

  • max_prepared_transactions defaults to zero for a reason — most applications should never enable it
  • Prepared transactions hold locks and block vacuum, just like idle-in-transaction connections, but they survive server restarts and aren’t subject to any timeout
  • The only way to find them is querying pg_prepared_xacts — they don’t appear in pg_stat_activity
  • If you enable 2PC, you must monitor pg_prepared_xacts with alerting on transaction age
  • Orphaned prepared transactions from crashed coordinators are the most common failure mode — have a cleanup runbook ready
  • Consider whether you actually need distributed atomicity, or whether eventual consistency with idempotent operations would serve your use case with far less operational risk

Previously: Idle in Transaction Connections: The Silent Killer