PostgreSQL Deep Dive: Autovacuum Internals — How Postgres Decides When to Clean Up

If you have ever watched autovacuum ignore a bloated table while obsessively vacuuming a tiny one, you know that “autovacuum handles it” is not the whole story. Autovacuum has a specific decision engine with thresholds, scale factors, and cost limits — and when you do not understand how they interact, you get either too much vacuuming or not enough.

Yesterday we covered transaction ID wraparound and why vacuum must run. Today we are opening the hood on autovacuum itself: how it decides when to run, what to run on, and how to tune it for your actual workload instead of accepting the defaults.

The Architecture: Launcher, Workers, and the Nap Cycle

Autovacuum is not a single process. It is a small distributed system inside your Postgres cluster:

Autovacuum Launcher — a persistent daemon that wakes up every autovacuum_naptime (default: 1 minute). It checks which databases need attention and spawns worker processes.
Autovacuum Workers — each worker is assigned to one database. It scans the tables in that database and decides which ones need VACUUM or ANALYZE.
Worker limit — at most autovacuum_max_workers (default: 3) workers can run simultaneously across the entire cluster. If you have 5 databases and 3 workers, two databases wait.

The launcher distributes workers across databases on a round-robin schedule. With autovacuum_naptime = 60s and 3 databases, the launcher tries to start a worker in each database every 20 seconds. But if all workers are busy, databases queue up.

The key gotcha

If several large tables become eligible for vacuum simultaneously, all workers can get stuck on those tables. Smaller tables — even if they desperately need vacuuming — will wait. This is a common source of “autovacuum isn’t running on my table” complaints. It is running — it is just busy elsewhere.

The Decision Engine: When Does a Table Get Vacuumed?

Autovacuum does not vacuum tables on a schedule. It uses a threshold formula based on how many rows have changed since the last vacuum. The formula has evolved over Postgres versions, but the core logic is:

vacuum threshold = MIN(
  autovacuum_vacuum_max_threshold,
  autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * reltuples
)

Where:

autovacuum_vacuum_threshold = 50 (base threshold — minimum changes before vacuuming)
autovacuum_vacuum_scale_factor = 0.2 (20% of the table must change)
autovacuum_vacuum_max_threshold = 2,000,000,000 (upper cap, Postgres 17+)
reltuples = estimated row count from pg_class

What this means for different table sizes

Table Size	Scale Factor (0.2)	Changes Needed to Trigger Vacuum
1,000 rows	200	250 (50 base + 200)
100,000 rows	20,000	20,050
10 million rows	2,000,000	2,000,050
1 billion rows	200,000,000	200,000,050

For a billion-row table, you need 200 million dead tuples before autovacuum kicks in. That is a lot of bloat. This is the #1 reason large tables get out of hand — the default scale factor is too generous for big tables.

The Insert Threshold: Append-Only Tables Get Love Too

Historically, autovacuum only triggered on UPDATEs and DELETEs (which create dead tuples). INSERT-only tables were ignored unless they needed a freeze. But freeze-only vacuums do not advance the visibility map, so index-only scans suffered.

PostgreSQL 13 introduced autovacuum_vacuum_insert_threshold and autovacuum_vacuum_insert_scale_factor to fix this. The insert threshold formula:

insert threshold = autovacuum_vacuum_insert_threshold +
                   autovacuum_vacuum_insert_scale_factor * reltuples *
                   (1 - relallfrozen / relpages)

The percent of table not frozen multiplier means insert-triggered vacuums run more aggressively when there are many unfrozen pages — they can set the all-visible flag and freeze tuples in one pass. Defaults are autovacuum_vacuum_insert_threshold = 1000 and autovacuum_vacuum_insert_scale_factor = 0.2.

The ANALYZE Threshold: Keeping Statistics Fresh

ANALYZE has its own separate threshold:

analyze threshold = autovacuum_analyze_threshold +
                    autovacuum_analyze_scale_factor * reltuples

Defaults: autovacuum_analyze_threshold = 50, autovacuum_analyze_scale_factor = 0.1 (10%).

Notice that ANALYZE triggers at 10% change, while VACUUM triggers at 20%. This means autovacuum will ANALYZE your table twice as often as it vacuums it — statistics stay fresher than cleanup. This is by design: stale statistics cause bad query plans immediately, while dead tuples only waste space and slow scans.

Reading pg_stat_user_tables: Your Autovacuum Dashboard

The cumulative statistics system tracks everything autovacuum cares about. Here is a comprehensive monitoring query:

SELECT
  schemaname || '.' || relname AS table_name,
  pg_size_pretty(pg_total_relation_size(schemaname || '.' || relname)) AS size,
  n_live_tup,
  n_dead_tup,
  round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 2) AS dead_pct,
  last_vacuum,
  last_autovacuum,
  last_analyze,
  last_autoanalyze,
  vacuum_count,
  autovacuum_count
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 20;

Key columns to watch:

n_dead_tup — dead tuples not yet reclaimed. High numbers = vacuum is behind.
dead_pct — percentage of dead tuples. Anything above 10-20% on a busy table means autovacuum is not keeping up.
last_autovacuum / last_autoanalyze — when did autovacuum last visit? NULL means never.
autovacuum_count — how many times autovacuum has run on this table since stats reset.

Dead tuple estimation gotcha

The n_live_tup and n_dead_tup counters are estimates based on the cumulative statistics system. They are updated by each INSERT, UPDATE, and DELETE operation — but they are eventually consistent, not exact. If you just bulk-deleted 10 million rows, the counters might not reflect that immediately.

You can force a refresh:

-- Does not actually analyze, just refreshes the stats counters
-- (well, it does analyze, but the point is refreshing the view)
ANALYZE my_table;

Tuning Autovacuum: The Practical Guide

Problem 1: Large table, too much bloat before vacuum

The default scale factor of 0.2 means a 500 million row table accumulates 100 million dead tuples before vacuuming. Fix it with a lower scale factor or absolute threshold:

-- Option A: Lower the scale factor for this specific table
ALTER TABLE large_events
  SET (autovacuum_vacuum_scale_factor = 0.05);  -- 5% instead of 20%

-- Option B: Use an absolute threshold (ignores scale factor when set low)
ALTER TABLE large_events
  SET (autovacuum_vacuum_threshold = 100000);  -- vacuum every 100K changes

Option A is simpler. Option B gives you predictable behavior regardless of table growth. Some teams use both: a scale factor as a safety net and a threshold as the primary trigger.

Problem 2: Autovacuum is too slow / not keeping up

Autovacuum throttles itself using a cost-based vacuum delay mechanism. It assigns “cost” to each I/O operation:

vacuum_cost_page_hit = 1 (page found in shared_buffers)
vacuum_cost_page_miss = 2 (page read from OS cache)
vacuum_cost_page_dirty = 20 (page modified by vacuum)

When the accumulated cost exceeds autovacuum_vacuum_cost_limit (default: 200), the worker sleeps for autovacuum_vacuum_cost_delay (default: 2ms in Postgres 12+, 20ms in older versions).

At default settings, autovacuum can process roughly 200 pages (1.6MB) before sleeping. For a large table with many dirty pages, this is extremely slow.

Fix: increase the cost limit for specific tables:

-- 10x the default cost limit for this critical table
ALTER TABLE large_events
  SET (autovacuum_vacuum_cost_limit = 2000);

Or globally:

ALTER SYSTEM SET autovacuum_vacuum_cost_limit = 1000;
-- Note: this affects all workers, and is "balanced" across them

Important: When multiple workers run, the global cost limit is divided among them. With 3 workers and a global limit of 1000, each worker gets ~333. But per-table cost limits (set via ALTER TABLE) are not subject to this balancing — a table with autovacuum_vacuum_cost_limit = 2000 gets all 2000 regardless of other workers.

Problem 3: Workers stuck on large tables, small tables starved

If you have 3 workers and one is stuck on a multi-GB table, the other two workers handle everything else. But if two large tables trigger simultaneously, only 1 worker remains for all other tables.

Fix: increase workers and set per-table throttling:

ALTER SYSTEM SET autovacuum_max_workers = 6;  -- requires restart
SELECT pg_reload_conf();

-- Give the large tables their own cost limits so they use workers efficiently
ALTER TABLE large_events SET (autovacuum_vacuum_cost_limit = 2000);
ALTER TABLE large_sessions SET (autovacuum_vacuum_cost_limit = 2000);

More workers = more parallelism, but each worker gets a smaller share of the global cost limit. Per-table cost limits bypass this sharing.

Problem 4: Autovacuum keeps getting canceled

Autovacuum holds a SHARE UPDATE EXCLUSIVE lock, which conflicts with most DDL. If something else is trying to ALTER TABLE, CREATE INDEX, or even ANALYZE the table, autovacuum gets interrupted — and it does not retry immediately. It waits for the next nap cycle.

Check canceled vacuums:

-- Look for "canceling autovacuum task" in logs
-- Or check pg_stat_user_tables:
SELECT relname, autovacuum_count
FROM pg_stat_user_tables
WHERE autovacuum_count = 0 AND n_dead_tup > 10000;

Fix: Avoid running ANALYZE or DDL on tables that autovacuum is actively processing. Check pg_stat_activity for autovacuum workers before running maintenance.

The Anti-Wraparound Override

One critical detail: when a table’s relfrozenxid exceeds autovacuum_freeze_max_age (default: 200 million), autovacuum forces an aggressive vacuum even if autovacuum is disabled. This vacuum:

Cannot be interrupted by conflicting lock requests (unlike normal autovacuums)
Scans every non-frozen page, not just pages with dead tuples
Shows up in pg_stat_activity as (to prevent wraparound)

This is the safety net we covered yesterday. The tuning implication: if you set autovacuum_freeze_max_age high for large static tables, the forced vacuum will be very expensive when it finally runs. Consider running manual aggressive vacuums during off-peak hours instead.

Per-Table Tuning: The Complete Cheat Sheet

-- Accelerate vacuuming on a hot, large table
ALTER TABLE my_table SET (
  autovacuum_vacuum_scale_factor = 0.02,       -- vacuum at 2% changes
  autovacuum_vacuum_threshold = 50000,          -- or 50K changes, whichever is lower
  autovacuum_analyze_scale_factor = 0.01,       -- analyze at 1% changes
  autovacuum_vacuum_cost_limit = 2000,          -- 10x default speed
  autovacuum_vacuum_cost_delay = 1              -- minimal sleep (ms)
);

-- Slow down vacuuming on a low-priority logging table
ALTER TABLE audit_logs SET (
  autovacuum_vacuum_scale_factor = 0.5,         -- 50% changes before vacuum
  autovacuum_vacuum_cost_limit = 100            -- slower vacuuming
);

-- Disable autovacuum entirely (NOT recommended except for very specific cases)
ALTER TABLE static_reference SET (
  autovacuum_enabled = false
);
-- WARNING: anti-wraparound vacuum will still run regardless

To check current per-table settings:

SELECT c.oid::regclass, array_agg(s)
FROM pg_class c
JOIN lateral (
  SELECT name || '=' || setting AS s
  FROM pg_options
  WHERE option LIKE 'autovacuum%'
) s ON true
-- This does not work directly; use this instead:

SELECT
  c.oid::regclass AS table_name,
  reloptions
FROM pg_class c
WHERE c.relkind = 'r'
  AND reloptions IS NOT NULL
ORDER BY c.oid::regclass::text;

Logging: See What Autovacuum Is Doing

Enable autovacuum logging to catch problems early:

-- Log all autovacuum runs that take longer than 1 second
ALTER SYSTEM SET log_autovacuum_min_duration = '1s';
SELECT pg_reload_conf();

The log output tells you exactly what happened:

LOG: automatic vacuum of table "mydb.public.events": index scans: 1
  pages: 0 removed, 428721 remain, 12 skipped due to pins, 0 skipped frozen
  tuples: 8400000 removed, 52341062 remain, 0 are dead but not yet removable
  buffer usage: 432712 hits, 42851 misses, 12583 dirtied
  avg read rate: 12.345 MB/s, avg write rate: 3.627 MB/s
  system usage: CPU: user: 12.34 s, system: 3.45 s, elapsed: 45.67 s

Key fields to read:

tuples removed — dead tuples reclaimed
skipped due to pins — pages that were busy (high = contention)
skipped frozen — pages already frozen (good — efficient)
dirtied — pages modified by vacuum (setting hint bits, freezing)
elapsed — total time. If this is growing over time, tune the cost limits.

Key Takeaways

Autovacuum is threshold-based, not schedule-based. It runs when dead tuples exceed threshold + scale_factor × reltuples. For large tables, the default 20% scale factor means too much bloat before triggering.
Per-table tuning is essential. Use ALTER TABLE ... SET (autovacuum_vacuum_scale_factor = ...) to make vacuuming more aggressive for your large, hot tables and less aggressive for small or static ones.
Cost-based throttling controls vacuum speed. Default settings are intentionally conservative. Increase autovacuum_vacuum_cost_limit (globally or per-table) to make vacuums finish faster.
Workers are a shared resource. With 3 default workers, one stuck vacuum can starve other tables. Monitor pg_stat_user_tables for tables with high dead tuples but recent last_autovacuum = NULL.
ANALYZE and VACUUM have separate thresholds. ANALYZE triggers at 10% change (more sensitive than VACUUM’s 20%). Stale statistics cause bad plans; stale vacuum causes bloat. Both matter.
Logging is your best friend. Set log_autovacuum_min_duration = '1s' (or even 0 for debugging) and review the logs. They tell you exactly which tables autovacuum visits and how long it takes.

What’s Next

Tomorrow we are going deeper into the vacuum family: VACUUM vs VACUUM ANALYZE vs VACUUM FULL — when each one is appropriate, why VACUUM FULL is almost always the wrong answer, and the alternative tools (pg_repack, pg_squeeze) that do the same job without locking your table.

This post is part of the PostgreSQL Deep Dive series, published weekdays at 9am AEST. Catch up on the full series or browse by PostgreSQL tag.