PostgreSQL Deep Dive: BRIN Indexes — How a 1 MB Index Can Replace a 1 GB B-Tree
You have a 500 GB table with a timestamp column. You create a B-tree index on it. The index is 12 GB. Every query that filters on a date range uses it, but the index is so large it barely fits in memory, and the maintenance cost is enormous — every INSERT updates a deep, wide B-tree.
Now imagine replacing that 12 GB B-tree with a 2 MB BRIN index that delivers the same query performance. That’s not hypothetical. That’s what BRIN indexes do when your data has physical correlation with the indexed column.
How BRIN Works: Summaries Instead of Pointers
A B-tree index stores a pointer for every row. 500 million rows = 500 million index entries = 12 GB of index data. Every row is individually addressable.
A BRIN index doesn’t store row pointers. Instead, it divides the table into block ranges — groups of physically adjacent pages — and stores a summary of the indexed column’s values within each range. The default pages_per_range is 128 pages, which is roughly 1 MB of table data per range.
For a minmax operator class (the most common), the summary is just two values: the minimum and maximum of the indexed column within that block range. One range summary = ~6 bytes. A 500 GB table with 128-page ranges produces roughly 62,500 range summaries = ~375 KB of index data.
That’s the core idea. Instead of indexing every row, BRIN indexes every megabyte of table data with a single min/max pair.
The Query Execution: Lossy Bitmap Scan
When you query with a BRIN index:
SELECT * FROM orders
WHERE order_date BETWEEN '2026-01-01' AND '2026-01-31';
PostgreSQL scans the BRIN index range summaries. For each range, it checks: does the [min, max] interval overlap with the query range?
- Range 42: min=2025-11-15, max=2025-12-31 → no overlap → skip the entire block range
- Range 43: min=2026-01-02, max=2026-01-28 → overlap → include all pages in this range
- Range 44: min=2026-01-15, max=2026-02-10 → overlap → include all pages in this range
- Range 45: min=2026-02-05, max=2026-03-01 → overlap → include all pages
- Range 46: min=2026-03-15, max=2026-04-01 → no overlap → skip
The matching ranges produce a bitmap of heap pages. PostgreSQL then does a bitmap heap scan — it reads only the pages from matching ranges and checks each row against the actual WHERE clause (the “recheck” step). Rows that don’t match are discarded.
This is why BRIN indexes are called lossy: they don’t precisely identify which rows match. They identify which block ranges might contain matching rows, and the executor filters at the row level. If a range’s min/max overlaps your query but the range contains mostly non-matching values, you read pages you didn’t need to. This is the fundamental tradeoff: tiny index, but potentially more heap pages read than a precise B-tree.
When BRIN Works: Physical Correlation
The key factor that determines whether BRIN is effective is physical correlation between the indexed column and the table’s physical layout. You can measure this with pg_stats.correlation:
SELECT
tablename,
attname,
correlation
FROM pg_stats
WHERE tablename = 'orders'
AND attname = 'order_date';
correlation ranges from -1 to 1:
- 1.0: Perfect physical correlation. Values increase monotonically with the physical row order. BRIN works beautifully.
- 0.0: No correlation. Values are randomly distributed. BRIN is useless — every range’s min/max will span the entire value domain.
- -1.0: Perfect negative correlation. Values decrease with physical order. BRIN still works (ranges are still narrow in value space).
For BRIN to be effective, you want correlation > 0.9 or so. The higher the better.
Common high-correlation patterns:
- Timestamp columns on append-only tables. Orders, logs, events, sensor data — if new rows always arrive at the end and are timestamped with the current time, the timestamp correlates perfectly with physical position.
- Serial/identity columns. Monotonically increasing PKs correlate with insertion order.
- Sequential IDs on time-series data. Device readings, trade records, audit logs.
Common low-correlation patterns where BRIN fails:
- Frequently updated columns. Updates scatter new tuple versions across the table.
- UUIDv4 primary keys. Random distribution destroys correlation.
- Columns modified by CLUSTER on a different key. CLUSTER reorders by its target index, potentially destroying correlation with other columns.
Operator Classes
BRIN supports several operator classes, each storing different summary types:
minmax (Linear Sort Order)
The default for scalar types. Stores min and max per range. Supports <, <=, =, >=, > operators.
Available for: int2, int4, int8, float4, float8, numeric, timestamp, timestamptz, date, time, timetz, interval, money, bpchar, text, varchar, bytea, bit, inet, cidr, macaddr, macaddr8, uuid, pg_lsn.
minmax-multi (Multiple Intervals)
PG14+ extension of minmax. Instead of one [min, max] interval, stores multiple intervals per range. This handles data with gaps or outliers — like a table of orders where most dates are in January but a few backdated entries from December exist in the same physical range.
CREATE INDEX idx_orders_date_mmm
ON orders USING brin (order_date minmax_multi_ops)
WITH (pages_per_range = 64);
The values_per_range parameter (default 32) controls how many min/max intervals are stored per range. More intervals = more precise filtering = larger index.
When to use: Tables where the correlation is high but not perfect, or where a few outlier values in each range would cause the standard minmax to overestimate the range.
bloom (Equality Queries)
PG17+ operator class. Stores a Bloom filter per range — a probabilistic data structure that can answer “does this range possibly contain value X?” with no false negatives but some false positives. Supports only = operator.
CREATE INDEX idx_users_email_bloom
ON users USING brin (email bloom_ops)
WITH (pages_per_range = 16, false_positive_rate = 0.01);
The false_positive_rate parameter controls the Bloom filter size. Lower rates = larger filters = fewer false positives = less unnecessary heap scanning. Default is 0.01 (1%).
When to use: Large tables where you need to look up specific values by equality, and the physical distribution means most ranges won’t contain the value. Not a replacement for B-tree on unique or near-unique columns — it’s for low-selectivity equality queries on large tables where a B-tree is too expensive to maintain.
inclusion (Geometric Types)
Stores a bounding value that contains all values in the range. For boxes: a bounding box. For ranges: a containing range. Supports overlap, containment, and adjacency operators (&&, @>, <@, <<, >>, etc.).
When to use: Geospatial data, range types, and other types where a bounding container is a useful summary.
Index Maintenance and Summarization
BRIN indexes behave differently from B-trees at write time:
Inserts into already-summarized ranges are cheap. The range’s min/max is simply updated (widened) to include the new value. This is O(1) — one index page update, regardless of how many rows are in the range.
Inserts that create a new, unsummarized range do NOT automatically create a summary. The new range remains unsummarized until:
- VACUUM runs on the table (manual or autovacuum)
autosummarizeis enabled and autovacuum runs- You manually call
brin_summarize_new_values()
This is a critical gotcha. If you bulk-load data into a new table, the BRIN index ranges covering the new data won’t be summarized until VACUUM runs. Queries will fall back to sequential scans for those ranges because there’s no summary to check.
-- Manually summarize after bulk load
SELECT brin_summarize_new_values('idx_orders_date');
-- Check summarization status
SELECT brin_summarize_range('idx_orders_date', block_number);
SELECT brin_desummarize_range('idx_orders_date', block_number);
autosummarize
CREATE INDEX idx_orders_date
ON orders USING brin (order_date)
WITH (pages_per_range = 128, autosummarize = true);
When enabled, autovacuum will summarize new ranges when it runs in the same database. This is not on by default because it adds work to autovacuum. Enable it if your table receives frequent inserts that create new page ranges.
Autosummarize is lazy. It sends a request to autovacuum when a new block range is first filled. The actual summarization happens the next time an autovacuum worker finishes in that database. If the request queue is full, the request is dropped and logged:
LOG: request for BRIN range summarization for index "idx_orders_date" page 128 was not recorded
The range stays unsummarized until the next manual VACUUM or explicit brin_summarize_new_values() call.
What BRIN Cannot Do
BRIN has important limitations compared to B-tree:
No uniqueness constraints. You cannot create a UNIQUE BRIN index. BRIN doesn’t check individual values.
No index-only scans. BRIN doesn’t store row data or references to specific rows. Every BRIN query requires a heap fetch. The visibility map doesn’t help because BRIN isn’t pointing at specific tuples.
No sorted output. A BRIN index cannot return rows in order. ORDER BY order_date with a BRIN index still requires an explicit sort.
No point lookups. BRIN is for range scans and equality scans across block ranges, not for finding a specific row. A B-tree can find one row in O(log N) page reads. BRIN finds a range that might contain the row, then scans all pages in that range.
Silent degradation. If your data’s physical correlation breaks down (due to updates, CLUSTER on a different column, or random inserts), BRIN’s min/max ranges widen until they span the entire value domain. At that point, every range matches every query, and the index scan becomes a full sequential scan — but you’re still paying the (small) cost of scanning the index. There’s no error, no warning, just slower queries.
Practical SQL: Testing Whether BRIN Is Right for Your Table
Step 1: Check correlation
SELECT
schemaname,
tablename,
attname,
n_distinct,
correlation
FROM pg_stats
WHERE correlation > 0.8
AND n_distinct > 100
ORDER BY correlation DESC
LIMIT 20;
This finds columns with strong physical correlation and reasonable cardinality.
Step 2: Compare query plans
-- Create both indexes
CREATE INDEX idx_test_btree ON your_table USING btree (your_column);
CREATE INDEX idx_test_brin ON your_table USING brin (your_column);
-- Compare plans
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM your_table
WHERE your_column BETWEEN 'value1' AND 'value2';
-- Force BRIN
SET enable_seqscan = off;
SET enable_bitmapscan = on;
SET enable_indexscan = off;
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM your_table
WHERE your_column BETWEEN 'value1' AND 'value2';
Compare the actual rows read, heap blocks hit, and execution time. If BRIN reads significantly fewer blocks than a sequential scan and approaches B-tree performance, it’s a good fit.
Step 3: Compare index sizes
SELECT
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_indexes
WHERE tablename = 'your_table'
AND indexname IN ('idx_test_btree', 'idx_test_brin');
Step 4: Check if CLUSTER fixes correlation
If correlation is low but the data should be correlated (time-series data that’s been updated or reorganized), CLUSTER can restore it:
-- Check correlation before
SELECT correlation FROM pg_stats WHERE tablename = 'orders' AND attname = 'order_date';
-- Cluster the table on the column
CLUSTER orders USING idx_orders_date_btree;
-- Check correlation after
ANALYZE orders;
SELECT correlation FROM pg_stats WHERE tablename = 'orders' AND attname = 'order_date';
CLUSTER rewrites the table in B-tree order. After clustering, the correlation should be close to 1.0, making BRIN viable.
pages_per_range Tuning
The default pages_per_range = 128 works well for most tables. But you can tune it:
Smaller values (16, 32): More range summaries = larger index = more precise filtering. Useful when your data has moderate correlation (0.7-0.9) and you need tighter ranges to avoid scanning too many non-matching pages.
Larger values (256, 512): Fewer range summaries = smaller index = less precise filtering. Useful when correlation is very high (>0.95) and the wider ranges still eliminate most of the table.
The index size formula: index_size ≈ (table_size / (pages_per_range × 8KB)) × summary_entry_size
For a 500 GB table with pages_per_range=128 and minmax (6 bytes per summary): 500 GB / 1 MB = 500,000 summaries × ~50 bytes per index tuple = ~25 MB. Still tiny compared to a B-tree.
Key Takeaways
- BRIN indexes store min/max summaries per block range (~1 MB of table data), not per-row pointers. A 500 GB table can have a 2 MB BRIN index where the equivalent B-tree is 12 GB.
- BRIN is only effective when the indexed column has high physical correlation with the table’s layout (
pg_stats.correlation > 0.9). Check correlation before creating a BRIN index. - BRIN queries use lossy bitmap scans — matching ranges are scanned, and non-matching rows are discarded at the heap level. The index is cheap but potentially less precise than a B-tree.
- New page ranges are not automatically summarized. Run
brin_summarize_new_values()after bulk loads, or enableautosummarize = truein the index options. - BRIN cannot enforce uniqueness, support index-only scans, return sorted results, or do efficient point lookups.
- The biggest risk is silent degradation: if correlation breaks down, BRIN ranges widen until every query matches every range, and you end up doing sequential scans with a small index overhead.
minmax-multi(PG14+) handles gaps and outliers by storing multiple intervals per range.bloom(PG17+) enables equality queries via Bloom filters.- Use
CLUSTERto restore physical correlation before switching to BRIN on a table that’s been fragmented by updates.
What’s Next
Tomorrow we’ll look at synchronous commit and durability trade-offs — how PostgreSQL’s synchronous_commit setting lets you choose between absolute durability and lower latency, and why the middle ground (remote_apply, remote_flush) might be exactly what your application needs.
Previous: Replication Slots and Disk Fill-Up — The Silent Killer That Stops Your Primary