PostgreSQL Deep Dive: effective_cache_size — The Parameter That Changes Nothing Yet Changes Everything
You change effective_cache_size from 4GB to 24GB. You check pg_stat_bgwriter. Nothing changes. You check process memory with ps. Nothing changes. You look at shared memory segments with ipcs. Nothing changes.
But your queries just got 50x faster.
What happened?
The Parameter That Allocates Nothing
effective_cache_size is unique among PostgreSQL’s memory parameters. Unlike shared_buffers, work_mem, or maintenance_work_mem, it allocates zero bytes. It doesn’t reserve memory, it doesn’t create shared memory segments, and it doesn’t touch the OS page cache. It exists entirely inside the planner’s head.
Here’s what the PostgreSQL documentation says:
Sets the planner’s assumption about the effective size of the disk cache that is available to a single query. This is factored into estimates of the cost of using the index; a higher value makes it more likely index scans will be used, a lower value makes it more likely sequential scans will be used.
That’s it. It’s a hint. The planner assumes that when it needs to read a page from disk, there’s roughly effective_cache_size bytes of cache standing between it and the actual spindle (or SSD). The more cache the planner believes exists, the cheaper random I/O looks — and the more attractive index scans become.
Default value: 4GB. On a server with 64GB of RAM and 16GB of shared_buffers, that default is catastrophically wrong.
How the Planner Uses It: The Index Cost Model
When the planner considers an index scan, it estimates how many 8KB pages it will need to fetch. Then it asks: are those pages likely to be in cache, or will they require a physical disk read?
Here’s the simplified logic in src/backend/optimizer/path/costsize.c:
If (pages_fetched <= effective_cache_size)
most_index_pages_are_cached = true
else
most_index_pages_are_cached = false
When the planner estimates that most index pages are cached, it multiplies the page fetch cost by a correlated fraction — essentially treating random I/O as nearly free. When it thinks pages won’t be cached, it applies the full random_page_cost (default 4.0) to each page fetch.
The actual formula in the optimizer is:
pages_fetched = index_pages + (table_pages * selectivity * correlation_factor)
if (pages_fetched <= effective_cache_size)
io_cost = random_page_cost * pages_fetched * correlation_factor
else
io_cost = random_page_cost * effective_cache_size * correlation_factor
+ seq_page_cost * (pages_fetched - effective_cache_size)
The key insight: the crossover point. Below effective_cache_size, the planner models random I/O with random_page_cost. Above it, excess pages are costed at seq_page_cost — because the planner assumes it’s doing a sequential scan at that point anyway.
This means effective_cache_size directly controls the threshold where the planner switches from “index scan is cheaper” to “just scan the whole table.”
The Plan Flip: A Concrete Example
Let’s say you have a 100GB table with an index on created_at. Your query:
SELECT * FROM orders
WHERE created_at BETWEEN '2026-04-01' AND '2026-04-07';
The planner estimates this will return ~1.5% of rows (1 week out of ~2 years). It needs to fetch approximately:
- Index pages: ~5,000
- Table pages: ~160,000 (1.5% of ~10.7M pages)
With effective_cache_size = 4GB (default)
Total pages needed: 165,000 effective_cache_size in pages: 524,288
Pages fit within cache! But the planner is conservative about correlation and caching assumptions. With 4GB, it doubts the index pages are hot in cache for a 100GB table, so it may estimate significant physical I/O:
Index Scan estimated cost: 165,000 × random_page_cost (4.0) = 660,000
Seq Scan estimated cost: 10,700,000 × seq_page_cost (1.0) = 10,700,000
Index scan wins — barely. But what if the selectivity goes up to 5%?
Pages needed: ~535,000 + 5,000 = 540,000
That exceeds effective_cache_size (524,288 pages)!
Planner now costs excess pages at seq_page_cost.
Index scan cost shoots up → Seq Scan chosen.
With effective_cache_size = 52GB (properly tuned)
effective_cache_size in pages: 6,815,744
Even at 10% selectivity (1,070,000 table pages), all pages fit within the planner’s cache assumption. The planner keeps using index scans for much higher selectivities, which is correct — because on a server with 64GB RAM and 16GB shared_buffers, you actually DO have ~48GB of OS cache available.
The plans flip. Queries that were doing full table scans start using indexes. The query that took 8 seconds now takes 200ms. No actual memory changed.
How to Set It Correctly
The formula is straightforward:
effective_cache_size = shared_buffers + (total_RAM × 0.75 - shared_buffers)
Or more simply:
effective_cache_size ≈ 50-75% of total system RAM
Why not 100%? Because:
- The OS needs memory for other processes
- PostgreSQL needs memory for
work_memallocations, WAL buffers, and connection overhead - The OS page cache isn’t exclusively used for PostgreSQL files
- Other processes (pg_dump, monitoring agents, backups) consume RAM
Practical values by server size
| Total RAM | shared_buffers | effective_cache_size | Notes |
|---|---|---|---|
| 4 GB | 1 GB | 3 GB | Small dev server |
| 8 GB | 2 GB | 6 GB | Small production |
| 16 GB | 4 GB | 12 GB | Medium production |
| 32 GB | 8 GB | 24 GB | Standard production |
| 64 GB | 16 GB | 48 GB | Large production |
| 128 GB | 32 GB | 96 GB | Data warehouse |
| 256 GB | 64 GB | 192 GB | Large data warehouse |
For Linux, you can check actual OS cache:
# Total cache (includes non-PostgreSQL files)
free -h | grep Mem | awk '{print $6}'
# More precise: check PostgreSQL data directory size vs cache
# The OS caches pages from all files, not just PostgreSQL
cat /proc/meminfo | grep -E "^Cached:|^Buffers:"
What effective_cache_size Is NOT
Common misconceptions that will hurt you:
❌ “It reserves cache memory”
No. The OS manages its page cache independently. effective_cache_size is purely a planner cost parameter. Setting it to 100GB on a 4GB server doesn’t allocate anything — it just makes the planner naively optimistic.
❌ “It assumes cache persists between queries”
The planner assumes no cross-query cache persistence. Each query is costed independently. The effective_cache_size represents the cache available to a single query, not accumulated across the session.
This is why the setting matters: even though pages from your previous query are almost certainly still in the OS cache, the planner doesn’t count on it. It just says “given X GB of cache, what fraction of these pages are likely to be in it?”
❌ “Higher is always better”
Wrong. Set effective_cache_size higher than your actual RAM, and the planner will choose index scans for queries that should be sequential scans. On a table where you’re selecting 40% of rows, an index scan is catastrophically slower than a sequential scan — the random I/O pattern is devastating, even on SSDs.
-- Disaster scenario: effective_cache_size = 200GB on a 32GB server
-- Planner thinks index scan on 40% of a 50GB table is fine
-- Reality: 6.5M random page reads, most hitting disk
-- Time: 45 seconds instead of 3 seconds for seq scan
❌ “It only affects index scans”
It also influences:
- Bitmap scan decisions — whether to use a bitmap heap scan or fall back to sequential scan
- Nested loop join costing — how expensive the planner thinks inner index lookups will be
- Index-only scan likelihood — whether the planner believes the visibility map pages are cached
- Partition pruning confidence — whether the planner thinks scanning fewer partitions is worthwhile
The Interaction With random_page_cost
effective_cache_size and random_page_cost are the two sides of the planner’s I/O cost coin:
random_page_cost(default 4.0): How expensive is a random page fetch relative to a sequential oneeffective_cache_size: How many of those random fetches are likely cached
Together, they determine the effective cost of an index scan:
| Scenario | random_page_cost | effective_cache_size | Effect |
|---|---|---|---|
| HDD, small cache | 4.0 | 4GB | Planner avoids indexes aggressively |
| SSD, small cache | 1.5 | 4GB | Planner is slightly more index-friendly |
| SSD, large cache | 1.1-1.5 | 75% RAM | Planner strongly prefers indexes |
| All in RAM | 1.0 | 75% RAM | Random = sequential cost, indexes always win when selective |
The standard advice for SSD-based servers:
-- postgresql.conf for SSD server with 64GB RAM
random_page_cost = 1.1 -- SSDs make random I/O nearly as cheap as sequential
effective_cache_size = '48GB' -- 75% of total RAM
shared_buffers = '16GB' -- 25% of total RAM
For HDD-based servers:
-- postgresql.conf for HDD server with 64GB RAM
random_page_cost = 4.0 -- HDD default, random I/O is genuinely expensive
effective_cache_size = '48GB' -- Still 75% of RAM — the cache is real
shared_buffers = '16GB'
Note: even on HDDs, a properly tuned effective_cache_size makes a huge difference because the OS cache absorbs most random reads for hot data.
Proving It: Before and After
Here’s a reproducible experiment you can run on any PostgreSQL instance:
-- Create a test table with 10M rows
CREATE TABLE cache_test AS
SELECT
generate_series(1, 10000000) AS id,
(random() * 1000)::int AS value,
NOW() - (random() * interval '365 days') AS created_at;
ALTER TABLE cache_test ADD PRIMARY KEY (id);
CREATE INDEX idx_cache_test_created ON cache_test(created_at);
ANALYZE cache_test;
-- Check table size (roughly 650MB)
SELECT pg_size_pretty(pg_table_size('cache_test'));
Now test with different effective_cache_size values:
-- Session 1: default (4GB)
SET effective_cache_size = '4GB';
EXPLAIN (ANALYZE, BUFFERS, COSTS ON)
SELECT COUNT(*) FROM cache_test
WHERE created_at > '2026-03-01';
-- Session 2: properly tuned
SET effective_cache_size = '48GB';
EXPLAIN (ANALYZE, BUFFERS, COSTS ON)
SELECT COUNT(*) FROM cache_test
WHERE created_at > '2026-03-01';
On a server with ample RAM, you’ll see the estimated costs change dramatically even if the actual execution doesn’t change (because the pages are already cached). On a server where the table is larger than RAM, you’ll see actual plan changes — index scans flip to sequential scans or vice versa.
Monitoring: Are Your Plans Cache-Aware?
Check if the planner is making reasonable cache assumptions by comparing estimated vs actual costs:
-- Find queries where actual time vastly differs from estimated cost
-- (sign of misconfigured effective_cache_size or stale statistics)
SELECT
query,
calls,
round(total_exec_time::numeric, 2) AS total_ms,
round(mean_exec_time::numeric, 2) AS mean_ms,
round(planning_time::numeric, 2) AS plan_ms,
rows
FROM pg_stat_statements
WHERE mean_exec_time > 100 -- only slow queries
ORDER BY mean_exec_time DESC
LIMIT 20;
Check your current effective_cache_size:
SELECT name, setting, unit, source
FROM pg_settings
WHERE name = 'effective_cache_size';
Check actual OS cache available on Linux:
# Total page cache in KB
cat /proc/meminfo | grep "^Cached:" | awk '{print $2}'
# Effective cache for PostgreSQL = shared_buffers + OS cache
# Remember: OS cache is shared with all processes, not just PostgreSQL
The Multi-Query Problem
effective_cache_size models cache available to a single query. But what about concurrent queries?
If you have 20 concurrent queries all scanning different parts of a 200GB table on a 64GB server, they’re competing for the same ~48GB of OS cache. The planner doesn’t know about this — it assumes each query has the full effective_cache_size available.
This means:
- On low-concurrency OLTP systems:
effective_cache_size = 75% RAMis accurate - On high-concurrency systems hitting different tables: consider being more conservative (50-60% RAM)
- On mixed workloads: use 75% as a starting point, adjust if you see unexpected sequential scans on hot indexes
You can verify by checking cache hit ratios during peak load:
-- Cache hit ratio during your busiest period
SELECT
schemaname,
relname,
round(
heap_blks_hit::numeric /
NULLIF(heap_blks_hit + heap_blks_read, 0) * 100,
2
) AS cache_hit_pct,
heap_blks_hit,
heap_blks_read
FROM pg_statio_user_tables
WHERE heap_blks_hit + heap_blks_read > 10000 -- only meaningful tables
ORDER BY heap_blks_read DESC
LIMIT 20;
If your hit ratio is above 99%, your effective_cache_size is probably fine or could be higher. If it’s below 95%, either the data doesn’t fit in cache (expected) or your cache setting is causing suboptimal plans.
The Formula for Cost Estimation
For those who want to understand the exact math, here’s how effective_cache_size flows through the index scan cost model:
Step 1: Estimate pages fetched
num_index_pages = ceil(num_index_tuples / tuples_per_page)
num_table_pages = ceil(num_tuples * selectivity / tuples_per_page)
correlation = correlation_between_index_and_table (from pg_stats)
Step 2: Adjust for cache
-- The planner assumes a fraction of pages are cached
-- based on effective_cache_size vs total pages in the relation
cache_fraction = LEAST(1.0, effective_cache_size / total_relation_pages)
Step 3: Cost the I/O
-- Cached pages: cost 0 (already in memory)
-- Uncached pages: cost random_page_cost (or seq_page_cost for sequential)
-- For correlated index scans (correlation ≈ 1.0):
-- Pages are fetched roughly sequentially → cheaper
-- For uncorrelated index scans (correlation ≈ 0.0):
-- Pages are fetched randomly → more expensive
The correlation column in pg_stats is critical here:
-- Check index correlation for your tables
SELECT
attname AS column_name,
correlation
FROM pg_stats
WHERE tablename = 'your_table'
ORDER BY abs(correlation) ASC;
A correlation near 1.0 means the index order matches the physical table order (like a serial PK or a created_at column on an append-only table). A correlation near 0 means random physical access. The lower the correlation, the more effective_cache_size matters — because the planner needs to believe the scattered pages are cached.
Key Takeaways
-
effective_cache_sizeallocates nothing. It’s a planner hint about how much disk cache (shared_buffers + OS page cache) is available. It has zero runtime effect. -
Set it to 50-75% of total RAM. The formula:
shared_buffers + (RAM - shared_buffers) * 0.75. For a 64GB server with 16GB shared_buffers, that’s ~48GB. -
It controls the index vs sequential scan threshold. Higher values make the planner prefer indexes. Lower values make it prefer sequential scans. Getting it wrong means either too many seq scans (too low) or wasteful random I/O (too high).
-
Pair it with
random_page_costfor SSD servers. On SSDs, setrandom_page_cost = 1.1andeffective_cache_sizeto 75% RAM. On HDDs, keeprandom_page_cost = 4.0but still seteffective_cache_sizecorrectly. -
The planner assumes no cross-query caching. Each query is costed independently with the full
effective_cache_sizeavailable. This is conservative but consistent.
What’s Next
Tomorrow we’ll look at maintenance_work_mem and autovacuum_work_mem — how PostgreSQL’s memory settings for maintenance operations directly control vacuum performance, and why the wrong values cause autovacuum to either crawl or eat all your RAM.
Got questions about effective_cache_size or planner cost tuning? Drop them in the comments below.