PostgreSQL Deep Dive

PostgreSQL Deep Dive: Replication Slots and Disk Fill-Up — The Silent Killer That Stops Your Primary

Your primary database just crashed. Not from a hardware failure, not from a bad query, not from running out of memory. It crashed because the disk filled up. And the disk filled up because a standby server went offline three weeks ago, and the replication slot it was using has been accumulating WAL files ever since, preventing PostgreSQL from recycling any of them.

This is one of the most common ways PostgreSQL clusters die in production. It’s not a bug. It’s the replication slot doing exactly what it was designed to do: guarantee that the standby can catch up when it reconnects. But when the standby never reconnects, that guarantee becomes a death sentence for your primary.

How Replication Slots Work

A replication slot is a named, persistent data structure on the primary that tracks the replication progress of a consumer — either a physical standby (streaming replication) or a logical decoder (CDC, Debezium, etc.).

The slot stores one critical piece of information: the restart_lsn, the LSN (Log Sequence Number) of the oldest WAL position that the consumer hasn’t confirmed receiving. As long as this LSN exists, PostgreSQL will not recycle or remove any WAL files at or after that position. This is the fundamental guarantee of replication slots: the consumer will never miss WAL data because it was prematurely deleted.

Without replication slots, PostgreSQL recycles WAL segment files based on wal_keep_size (how many WAL files to keep in pg_wal/ for late-connecting standbys) and checkpoint progress. An unconnected standby might reconnect and find that the WAL it needs has already been recycled — forcing a full reinitialize from base backup. Replication slots eliminate this problem by preventing the recycling in the first place.

The problem is that this guarantee is unconditional. If the standby never reconnects, WAL accumulates forever.

Two Types of Slots, Same Risk

Physical Replication Slots

Used by streaming replication standbys. The standby connects, streams WAL, and the slot tracks how far behind it is. Simple and low-overhead.

-- Create a physical replication slot
SELECT pg_create_physical_replication_slot('standby_1');

-- On the standby, configure primary_slot_name
-- In postgresql.conf:
-- primary_slot_name = 'standby_1'

Logical Replication Slots

Used for logical decoding (pgoutput, wal2json, Debezium, etc.). The slot tracks which transactions have been decoded and sent. Logical slots also reserve WAL, but they have an additional concern: they prevent VACUUM from removing dead tuples that the slot hasn’t decoded yet. A stuck logical slot can bloat your tables, not just your WAL directory.

-- Create a logical replication slot
SELECT pg_create_logical_replication_slot('cdc_slot', 'pgoutput');

Both types of slots prevent WAL recycling. Logical slots add the extra risk of table bloat from un-vacuumed dead tuples.

The Failure Mode: WAL Accumulation

Here’s what happens when a standby dies and the replication slot keeps running:

  1. The standby disconnects. The replication slot’s active column becomes false.
  2. The primary continues generating WAL at its normal rate — say 2 GB/day for a busy database.
  3. At each checkpoint, PostgreSQL checks which WAL segments can be recycled. Every segment at or after the slot’s restart_lsn is protected.
  4. The slot’s restart_lsn doesn’t advance because no consumer is reading WAL through this slot.
  5. WAL accumulates in pg_wal/. Day 1: 2 GB. Day 7: 14 GB. Day 21: 42 GB. Day 30: 60 GB.
  6. The disk fills up. PostgreSQL can’t write new WAL. All transactions block. The database effectively stops.

The time from standby failure to disk full depends on your write rate and available disk space. On a busy system with 200 GB of free space and 4 GB/day of WAL generation, you have roughly 50 days. On a system with tight disk provisioning, it could be less than a week.

There are no errors, no warnings, and no automatic remediation by default. The primary just keeps going until it runs out of space.

Inspecting Replication Slots

SELECT
  slot_name,
  slot_type,
  plugin,
  active,
  active_pid,
  restart_lsn,
  pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) / 1024 / 1024 AS lag_mb,
  wal_status,
  inactive_since
FROM pg_replication_slots
ORDER BY lag_mb DESC;

Key columns to watch:

  • active: Is a consumer currently connected and streaming?
  • lag_mb: How much WAL is being retained by this slot? If active is false and this is growing, you have a problem.
  • wal_status: The availability state of WAL files claimed by this slot:
    • reserved: WAL is within max_wal_size — normal
    • extended: WAL has exceeded max_wal_size but is still retained by the slot
    • unreserved: The slot has fallen too far behind and some required WAL has already been removed
  • inactive_since: When the slot last lost its active connection (PG14+)

The Safety Valve: max_slot_wal_keep_size

Introduced in PostgreSQL 13, max_slot_wal_keep_size puts an upper limit on how much WAL a replication slot can retain:

-- In postgresql.conf:
-- Allow slots to retain up to 10 GB of WAL
max_slot_wal_keep_size = '10GB'

When a slot’s retained WAL exceeds this limit, PostgreSQL changes the slot’s wal_status to lost and invalidates the slot. The standby, when it reconnects, will find that the required WAL has been removed and will need to be reinitialized from a base backup.

Default is -1 (unlimited). This means unless you explicitly set this parameter, replication slots can fill your disk. Setting it is one of the first things you should do on any PostgreSQL instance using replication slots.

How much to set? Calculate based on your worst-case recovery window:

max_slot_wal_keep_size = WAL generation rate × maximum acceptable outage duration

If your database generates 2 GB/day of WAL and you want to tolerate a 3-day standby outage before requiring a base backup reinit, set max_slot_wal_keep_size = '6GB'. The tradeoff is clear: higher values give standbys more time to catch up, lower values protect your disk at the cost of more frequent base backup reinitializations.

Interaction with wal_keep_size: wal_keep_size (default 0) sets a minimum amount of WAL to keep for any standby, slot or no slot. max_slot_wal_keep_size sets a maximum that slots can reserve beyond wal_keep_size. If wal_keep_size = '1GB' and max_slot_wal_keep_size = '10GB', a slot can reserve up to 11 GB of WAL before being invalidated.

The Cleanup: idle_replication_slot_timeout

Introduced in PostgreSQL 17, idle_replication_slot_timeout automatically invalidates slots that have been inactive for too long:

-- In postgresql.conf:
-- Invalidate inactive slots after 7 days
idle_replication_slot_timeout = '7d'

This is the “forget about the dead standby after a week” setting. When a slot has been inactive (no connected consumer) for longer than this duration, PostgreSQL invalidates it at the next checkpoint. The WAL it was retaining becomes eligible for recycling.

Default is 0 (disabled). If you’re on PG17+, enable this as a safety net alongside max_slot_wal_keep_size.

Important caveat: The invalidation happens at checkpoint time, not at the exact timeout. If your checkpoint_timeout is 15 minutes, there could be a 15-minute lag between when the timeout was exceeded and when the slot is actually invalidated. In extreme cases, force a checkpoint with CHECKPOINT to trigger immediate cleanup.

Not applicable to all slots: This timeout doesn’t apply to slots that don’t reserve WAL, or to synced standby slots on the primary. Check the synced column in pg_replication_slots — synced slots are always considered inactive because they don’t perform logical decoding themselves.

What Happens When WAL Is Removed

When max_slot_wal_keep_size is exceeded (or WAL is removed for any reason), the slot’s wal_status changes to lost:

slot_name | wal_status
standby_1 | lost

When the standby reconnects and tries to use this slot, the replication connection fails:

FATAL:  could not start WAL streaming: ERROR:  requested WAL segment 000000010000000A00000040 has already been removed

The standby must be reinitialized from a fresh base backup. This is the tradeoff you accepted by setting max_slot_wal_keep_size — you protect the primary at the cost of potentially needing to rebuild the standby.

Logical Slots and Table Bloat

Physical slots only retain WAL. Logical slots add a second risk: they prevent VACUUM from removing dead tuples that haven’t been decoded yet.

A logical slot’s catalog_xmin tracks the oldest transaction that the slot needs for consistent decoding. VACUUM cannot remove any tuple deleted by a transaction at or after this catalog_xmin. If a logical slot is stuck (consumer disconnected, CDC pipeline down), dead tuples accumulate in every table that has been modified since the slot last advanced.

This is harder to detect than WAL accumulation because table bloat is spread across the entire database, not concentrated in pg_wal/. But the effect is the same: disk fills up, and performance degrades as queries scan over ever-growing tables.

-- Check for logical slots blocking vacuum
SELECT
  slot_name,
  slot_type,
  plugin,
  active,
  catalog_xmin,
  age(catalog_xmin) AS catalog_xmin_age,
  inactive_since
FROM pg_replication_slots
WHERE slot_type = 'logical'
ORDER BY catalog_xmin_age DESC;

If catalog_xmin_age is high and the slot is inactive, your tables are accumulating bloat. Fix it by either reconnecting the consumer or dropping the slot.

Creating Safe Replication Slots

Temporary Slots for Ephemeral Consumers

If you’re connecting a consumer that doesn’t need to survive restarts (a one-time data dump, a test logical replication setup), use a temporary replication slot:

-- Temporary physical slot (auto-dropped on session end)
SELECT pg_create_physical_replication_slot('temp_slot', true);

-- Temporary logical slot
SELECT pg_create_logical_replication_slot('temp_cdc', 'pgoutput', true);

Temporary slots are not written to disk and are automatically dropped when the session ends or on error. They cannot fill your disk across restarts.

Dropping Slots Safely

-- Drop an inactive physical slot
SELECT pg_drop_replication_slot('standby_1');

-- Drop an inactive logical slot
SELECT pg_drop_replication_slot('cdc_slot');

Only drop a slot if you’re certain the consumer is gone for good or you’re willing to reinitialize it. If the standby is in the process of reconnecting and you drop the slot, the standby will fail to find it and replication will break until you create a new slot and reconfigure.

Renaming Slots

PostgreSQL 17 added ALTER_replication_slot to rename slots:

ALTER REPLICATION SLOT standby_1 RENAME TO standby_prod_1;

Useful for operational cleanup without losing the slot’s position.

Diagnostic Queries

1. Slots with significant WAL retention

SELECT
  slot_name,
  slot_type,
  active,
  round(
    pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) / 1024 / 1024, 1
  ) AS retained_wal_mb,
  wal_status,
  inactive_since,
  CASE
    WHEN active THEN 'streaming'
    WHEN wal_status = 'lost' THEN 'NEEDS REINIT'
    WHEN inactive_since IS NOT NULL THEN 'INACTIVE - investigate'
    ELSE 'unknown'
  END AS status
FROM pg_replication_slots
WHERE restart_lsn IS NOT NULL
ORDER BY retained_wal_mb DESC;

2. Physical disk usage of pg_wal directory

-- Requires pg_stat_file access
SELECT pg_size_pretty(sum(size)) AS pg_wal_size
FROM pg_ls_dir('pg_wal') AS f,
     pg_stat_file('pg_wal/' || f)
WHERE f ~ '^[0-9A-F]{24}$';

Compare this with your available disk space. If pg_wal is consuming more than 20-30% of your disk, investigate your slots.

3. Alert-ready query for problematic slots

-- Slots that are inactive AND retaining more than 1 GB of WAL
-- This is what you should alert on
SELECT
  slot_name,
  slot_type,
  inactive_since,
  round(
    pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) / 1024 / 1024, 1
  ) AS retained_wal_mb
FROM pg_replication_slots
WHERE NOT active
  AND restart_lsn IS NOT NULL
  AND pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) > 1073741824;

Set this as a monitoring check. If any rows are returned, someone needs to decide: reconnect the consumer, drop the slot, or accept the risk of disk fill-up.

The Gotcha: The Standby Was Never Coming Back

The most common cause of replication slot disk fill-up isn’t a network outage or a standby crash. It’s operational drift. Someone set up a standby for a migration, forgot to clean up the replication slot after the migration completed. Or a Debezium connector was deployed for a PoC and abandoned. Or a standby was decommissioned but nobody removed its slot.

Replication slots are persistent. They survive server restarts. They don’t auto-clean. They accumulate WAL silently until something breaks. Every replication slot is a potential disk bomb, and you need to know about each one.

Practical Configuration

Here’s a safe baseline configuration for any PostgreSQL instance using replication:

# postgresql.conf

# Limit how much WAL any single slot can retain
max_slot_wal_keep_size = '10GB'

# On PG17+, auto-clean inactive slots after 7 days
idle_replication_slot_timeout = '7d'

# Keep some WAL for late-connecting standbys even without slots
wal_keep_size = '1GB'

# Log when replication connections are terminated
log_replication_commands = on

# Limit total number of slots to prevent accidental proliferation
max_replication_slots = 10

Monitor pg_replication_slots and alert on:

  • Any inactive physical slot retaining more than 1 GB
  • Any inactive logical slot with catalog_xmin older than 1 hour
  • Any slot with wal_status = 'extended' (WAL retention exceeding max_wal_size)
  • pg_wal/ directory size exceeding 30% of available disk space

Key Takeaways

  • Replication slots prevent WAL recycling to guarantee standbys can catch up. This guarantee is unconditional — an inactive slot will accumulate WAL until the disk fills and the primary crashes.
  • Physical slots retain WAL. Logical slots retain WAL and prevent VACUUM from removing dead tuples, causing table bloat.
  • max_slot_wal_keep_size (PG13+, default -1 unlimited) is the primary safety valve. Set it based on your WAL generation rate and maximum acceptable standby outage window.
  • idle_replication_slot_timeout (PG17+) auto-invalidates inactive slots after a configurable duration. Set it as a second safety net.
  • The most common cause of disk fill-up from replication slots is operational drift — decommissioned standbys or abandoned CDC pipelines with orphaned slots.
  • Monitor pg_replication_slots proactively. Alert on inactive slots with high WAL retention. This is not a “nice to have” — it’s essential infrastructure.
  • Temporary replication slots (fourth argument true to pg_create_*_replication_slot) auto-clean on session end. Use them for ephemeral consumers.

What’s Next

Tomorrow we’ll look at BRIN indexes — how a 1 MB block range index can sometimes replace a 1 GB B-tree, when physical data correlation makes them viable, and why they degrade silently when that correlation breaks down.


Previous: Index Bloat — Why Your Index Keeps Growing Even When Your Table Doesn’t