PostgreSQL Deep Dive

PostgreSQL Deep Dive: Planning Time Spikes — Why Your Queries Take Longer to Plan Than to Execute

A query that ran in 5ms yesterday now takes 3 seconds. You check the execution plan and it’s using a sequential scan instead of the index scan it used yesterday. Nothing changed in the query. Nothing changed in the schema. What happened?

The answer is almost always the same thing: the planner’s statistics went stale. After a bulk load, a major deletion, or a schema change, ANALYZE didn’t run (or didn’t run thoroughly enough), and the planner is working with row estimates that are off by 10x or 100x. With bad estimates, the planner makes bad choices — and bad plans are slow plans.

This is one of the most common operational gotchas in PostgreSQL: the plan changed not because the query changed, but because the data distribution changed and the planner doesn’t know about it yet.

How the Planner Uses Statistics

The query planner doesn’t look at your data when planning a query. It looks at statistics about your data — summarized approximations stored in two system catalogs:

pg_class — per-table and per-index statistics:

  • reltuples — estimated number of rows
  • relpages — estimated number of disk pages (8KB each)

These are updated by VACUUM, ANALYZE, and certain DDL commands like CREATE INDEX. They’re not updated on every write — that would be too expensive. Between updates, the planner scales reltuples based on the current physical table size relative to relpages, which gives a rough approximation.

pg_statistic — per-column statistics (visible through the pg_stats view):

  • n_distinct — estimated number of distinct values (negative values are fractions)
  • most_common_vals (MCV) — most frequently occurring values
  • most_common_freqs — frequency of each MCV (as a fraction)
  • histogram_bounds — bucket boundaries for value distribution
  • correlation — physical vs logical ordering (used for index scan cost estimation)

ANALYZE builds these statistics by sampling a random subset of rows from the table. The number of rows sampled is controlled by default_statistics_target (default 100), which means ANALYZE looks at 300 * default_statistics_target rows — typically 30,000 rows. For a table with 10 million rows, that’s a 0.3% sample.

The planner uses these statistics to estimate selectivity — what fraction of rows will match each WHERE clause condition. If the estimate is close to reality, the planner chooses well. If the estimate is wrong, the planner can choose catastrophically wrong plan.

When Statistics Go Stale

After Bulk Loads

-- Table has 10,000 rows, ANALYZE ran, statistics are fresh
-- reltuples ≈ 10,000, pg_stats shows accurate MCVs

-- Bulk insert 5 million rows
INSERT INTO orders SELECT ... FROM generate_series(1, 5000000);

-- reltuples still shows ≈ 10,000
-- pg_stats still reflects the old data distribution
-- The planner thinks this table is tiny

The planner estimates that SELECT * FROM orders WHERE status = 'pending' will return a handful of rows (based on the old statistics where pending was rare). It chooses an index scan. But with 5 million new rows, pending might now match 500,000 rows. The index scan performs 500,000 random I/O operations — catastrophically slower than a single sequential scan would have been.

Autovacuum will eventually trigger ANALYZE on this table, but the threshold is based on the percentage of rows changed. If autovacuum_analyze_scale_factor is 0.1 (10%), autovacuum won’t trigger until 1,000 rows change from the old baseline of 10,000 — completely missing the fact that 5 million rows were added in one shot.

After Major Deletions

-- Table has 100 million rows, statistics are fresh
DELETE FROM logs WHERE created_at < now() - interval '90 days';
-- Removed 80 million rows

-- reltuples still ≈ 100 million
-- pg_stats still reflects the old distribution
-- The planner overestimates row counts by 5x

Now the planner thinks scans will be expensive and might avoid using an index that would actually be very efficient on the remaining 20 million rows.

After Schema Changes

ALTER TABLE users ADD COLUMN region text;
UPDATE users SET region = CASE WHEN id % 10 = 0 THEN 'AU' ELSE 'US' END;
-- pg_stats has no statistics for the new column
-- The planner falls back to generic estimates

A new column starts with no statistics. The planner assumes uniform distribution and default selectivity (0.5% for equality, which is a wild guess). If the actual distribution is skewed (99% US, 1% AU), the plan can be wildly wrong.

The Default Statistics Target Problem

default_statistics_target controls how many MCV entries and histogram buckets ANALYZE stores. The default is 100, which means:

  • Up to 100 most common values tracked
  • Up to 100 histogram bucket boundaries
  • Sample size: 300 × 100 = 30,000 rows

For most workloads this is fine. But for columns with many distinct values and skewed distributions, 100 MCV entries might not be enough to capture the important values. A column with 10,000 distinct values where the top 200 values account for 80% of rows — the default 100 MCV entries will miss values 101-200, and the planner will treat them as “average” frequency.

-- Check current statistics target for a column
SELECT attname, stadistinct, stanumbers1
FROM pg_statistic
WHERE starelid = 'orders'::regclass AND staattnum = 5;

-- Increase statistics target for a specific column
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 500;

-- Re-analyze to collect more detailed stats
ANALYZE orders (status);

Higher targets mean more sample rows, more MCV entries, more histogram buckets — better estimates but slower ANALYZE and more space in pg_statistic. Most production systems leave the global default at 100 and tune specific columns individually.

The Correlation Trap

pg_stats.correlation measures how physically ordered a column’s values are relative to their logical order. It ranges from -1 to 1. A correlation near 1 means the column is physically sorted (like a serial or timestamp column). Near 0 means random order (like a UUID or shuffled column).

The planner uses correlation to estimate the cost of index scans. High correlation means the index scan will read pages roughly in order — fewer random I/O seeks, lower cost estimate. Low correlation means the index scan will jump around — many random seeks, higher cost estimate.

The problem: correlation is a point-in-time snapshot from the last ANALYZE. If you insert a lot of out-of-order data, the actual correlation drops but the stored value stays high. The planner underestimates the cost of the index scan and chooses it when a sequential scan would be faster.

-- Check correlation for indexed columns
SELECT
    schemaname,
    tablename,
    attname,
    correlation
FROM pg_stats
WHERE tablename = 'orders'
ORDER BY abs(correlation) DESC;

Diagnostic Queries

Find tables with stale statistics (last analyzed long ago relative to changes):

SELECT
    schemaname,
    relname AS table_name,
    n_live_tup AS estimated_rows,
    last_analyze,
    last_autoanalyze,
    now() - COALESCE(last_analyze, last_autoanalyze) AS stats_age,
    n_mod_since_analyze
FROM pg_stat_user_tables
WHERE n_mod_since_analyze > n_live_tup * 0.1
   OR (last_analyze IS NULL AND last_autoanalyze IS NULL)
ORDER BY n_mod_since_analyze DESC
LIMIT 20;

The n_mod_since_analyze column tracks how many rows have been modified since the last ANALYZE. When this is a large fraction of n_live_tup, your statistics are stale.

Compare estimated vs actual row counts:

SELECT
    relname,
    reltuples AS planner_estimate,
    pg_stat_get_live_tuples(c.oid) AS actual_live
FROM pg_class c
WHERE relkind = 'r'
AND relnamespace = 'public'::regnamespace
ORDER BY abs(reltuples - pg_stat_get_live_tuples(c.oid)) DESC
LIMIT 20;

Check if a specific query has a plan mismatch:

-- Run your query with EXPLAIN ANALYZE and look for:
-- "rows=1000 (actual rows=500000)" — planner estimated 1000, got 500000
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE status = 'pending' AND created_at > now() - interval '7 days';

The gap between the estimated rows and actual rows is your statistics quality signal. More than 10x discrepancy means the planner is working with bad information.

Find columns that might need higher statistics targets:

SELECT
    schemaname,
    tablename,
    attname,
    n_distinct,
    CASE
        WHEN n_distinct < 0 THEN round(-n_distinct * reltuples)
        ELSE n_distinct
    END AS estimated_distinct,
    correlation
FROM pg_stats
JOIN pg_class c ON c.relname = pg_stats.tablename
WHERE abs(correlation) < 0.1
AND n_distinct < -0.5
ORDER BY tablename, attname;

The Fix: When and How to Analyze

Manual ANALYZE after bulk operations:

-- After COPY, INSERT INTO ... SELECT, or any bulk operation
ANALYZE orders;

-- Target specific columns for faster analysis
ANALYZE orders (status, created_at, region);

Tune autovacuum analyze thresholds for busy tables:

-- Lower the threshold so autovacuum analyzes more frequently
ALTER TABLE orders SET (
    autovacuum_analyze_scale_factor = 0.02,
    autovacuum_analyze_threshold = 1000
);

The default autovacuum_analyze_scale_factor is 0.1 (10% of rows). For a 100-million-row table, that’s 10 million rows before autovacuum triggers ANALYZE. If your query performance is sensitive to statistics freshness, lower this for your critical tables.

Use per-column statistics targets wisely:

-- For columns with skewed distributions used in WHERE clauses
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 500;

-- For columns with uniform distribution (less critical)
ALTER TABLE orders ALTER COLUMN id SET STATISTICS 50;

For partitioned tables, analyze the parent:

-- This propagates to all partitions
ANALYZE measurement;

For temporary tables, remember to analyze manually:

CREATE TEMP TABLE session_results AS SELECT * FROM large_table WHERE ...;
-- Autovacuum does not run on temp tables!
ANALYZE session_results;

Temporary tables are not managed by autovacuum. If you create a temp table, load data, and then query it, you must run ANALYZE manually or the planner will have no statistics at all.

Key Takeaways

  • The planner doesn’t read your data — it reads about your data from pg_class and pg_statistic
  • Statistics are updated by ANALYZE, not by INSERT/UPDATE/DELETE, and only when autovacuum triggers it (or you run it manually)
  • After bulk loads, major deletions, or adding new columns, statistics can be stale for hours until autovacuum catches up
  • Stale statistics lead to bad row estimates, which lead to bad plan choices (wrong join order, wrong scan type, wrong join method)
  • Check n_mod_since_analyze in pg_stat_user_tables to find tables with stale statistics
  • Use EXPLAIN ANALYZE to spot the gap between estimated and actual rows — more than 10x means the planner is guessing wrong
  • Run ANALYZE manually after bulk operations, and lower autovacuum_analyze_scale_factor for tables where plan stability matters
  • Temporary tables require manual ANALYZE — autovacuum doesn’t cover them

Previously: pg_stat_statements query_id: Why Your Top Query Changes Between Versions