PostgreSQL Deep Dive: pg_stat_statements query_id — Why Your Top Query Changes Between Versions

Your observability dashboard tracks the top 10 queries by total execution time using pg_stat_statements.queryid. You upgrade from PostgreSQL 15 to 16 and every queryid changes. Your historical baselines are gone. Your alerts can’t correlate old queries with new ones. The dashboard shows “new” queries that are actually the same queries you’ve been monitoring for months.

This is the queryid stability problem — and it’s not just major upgrades. The query identifier is a hash of PostgreSQL’s post-parse-analysis tree, which includes internal object OIDs, so it can change when you drop and recreate a function, when your search_path differs, or even when you run the same query on a different machine architecture.

Let’s break down how queryid is computed, when it changes, and how to build monitoring that survives these shifts.

How queryid Is Computed

pg_stat_statements groups queries into entries. Two queries land in the same entry if they have the same queryid — a hash computed over the post-parse-analysis representation of the query. This happens after PostgreSQL has:

Parsed the SQL text into a parse tree
Analyzed the tree — resolved table and column references to internal OIDs, determined types, expanded * into column lists
Normalized the tree — replaced literal constants with parameter symbols ($1, $2, etc.)

The hash is computed over this analyzed, normalized tree. So:

SELECT * FROM users WHERE id = 42;
SELECT * FROM users WHERE id = 99;

Both get the same queryid because after normalization they become SELECT * FROM users WHERE id = $1. The constant is stripped and the tree structure is hashed.

But here’s the critical detail: the hash includes the internal OIDs of referenced objects, not their names. The users table is represented by its pg_class.oid, not the string "users". This is what makes queryid fragile.

The History: How We Got Here

Pre-PG 14: pg_stat_statements computed its own queryid internally. The queryid column existed only in the pg_stat_statements view. Third-party tools like pg_stat_kcache and pg_qualstats also needed query identifiers, and each had its own computation method. No two tools agreed on the same ID for the same query.

PG 14: PostgreSQL introduced compute_query_id, a core GUC that computes queryid during query processing. When set to auto (the default), it activates automatically if pg_stat_statements is loaded. The queryid is now visible in pg_stat_activity, EXPLAIN VERBOSE, CSV logs, and log_line_prefix. This unified the ecosystem — one ID, visible everywhere.

# PG 14+ — usually left at auto
compute_query_id = auto   # on, off, or auto

PG 16: pg_stat_statements gained the ability to normalize constants in utility commands (DDL like CREATE TABLE, ALTER TABLE, etc.). Previously, ALTER TABLE foo SET (fillfactor = 90) and ALTER TABLE foo SET (fillfactor = 80) would get different queryid values because the constant wasn’t normalized. Now they share the same entry.

PG 17: I/O timing columns were renamed (blk_read_time → shared_blk_read_time, blk_write_time → shared_blk_write_time). The queryid computation itself didn’t change, but the view schema did — breaking any monitoring queries that referenced the old column names.

When queryid Changes

1. Major Version Upgrades

The documentation is explicit: “it is not safe to assume that queryid will be stable across major versions of PostgreSQL.” The internal representation of the parse tree can change between major versions. A SELECT * FROM users WHERE id = $1 might hash differently in PG 15 vs PG 16 because the tree structure evolved.

This is the most common way historical baselines break. Your monitoring system stores queryid = 123456789 with the label “users table lookup” and after the upgrade, the same query now hashes to 987654321. The dashboard shows a new query with no history.

2. Schema Changes (Drop and Recreate)

-- Query references function get_user_name(oid)
SELECT get_user_name(oid) FROM users;

-- Someone drops and recreates the function
DROP FUNCTION get_user_name(oid);
CREATE FUNCTION get_user_name(oid) RETURNS text AS ...;

-- Same query text, different function OID → different queryid
SELECT get_user_name(oid) FROM users;

The old and new get_user_name functions have different pg_proc.oid values. Since the hash includes these OIDs, the queryid changes even though the query text is identical.

Conversely, if you drop and recreate a table, two apparently-different queries might end up with the same queryid — the new table gets a new OID, but if the old and new queries happen to hash to the same value due to the OID collision pattern, they merge. This is rare but documented.

3. Different search_path

SET search_path = schema_a;
SELECT * FROM users;  -- Resolves to schema_a.users (OID 12345)

SET search_path = schema_b;
SELECT * FROM users;  -- Resolves to schema_b.users (OID 67890)

Same query text, different underlying objects, different queryid. This is correct behavior — they’re genuinely different queries — but it can confuse monitoring if you’re not aware of it.

4. Different Machine Architecture

The hash computation is sensitive to platform differences. A query running on an x86_64 primary and an ARM standby might produce different queryid values. The documentation notes: “The hashing process is also sensitive to differences in machine architecture and other facets of the platform.”

Physical replication preserves queryid (same system, same OIDs). Logical replication does not guarantee it.

5. Alias Differences

SELECT u.name FROM users u;
SELECT u.name FROM users usr;

These get different queryid values because the alias is part of the parse tree. Same underlying table, same columns, different alias — different hash. This is a common source of queryid proliferation in ORM-generated queries where aliases are auto-generated with random suffixes.

The Hash Collision Problem

queryid is a 64-bit hash. Collisions are rare but possible. The documentation warns: “there is a small chance of hash collisions causing unrelated queries to be merged into one entry.” When this happens, the query text shows whichever query was first to create the entry, and the statistics (total_time, calls, etc.) are aggregated across both queries.

Collisions can only happen within the same (userid, dbid) pair — different users or different databases can never collide.

If you see suspicious statistics — a query that appears to be a simple SELECT 1 but shows enormous execution time — consider that it might be a hash collision with a genuinely expensive query.

Practical Monitoring Queries

Find your top queries by total execution time:

SELECT
    queryid,
    left(query, 80) AS query_preview,
    calls,
    round(total_exec_time::numeric, 1) AS total_ms,
    round(mean_exec_time::numeric, 2) AS mean_ms,
    round((100.0 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS pct_total
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

Track queryid changes across a schema migration:

-- Before migration: snapshot current queryids
CREATE TABLE queryid_snapshot AS
SELECT queryid, left(query, 100) AS query_preview, dbid, userid
FROM pg_stat_statements;

-- After migration: find queries whose queryid changed
SELECT
    old.queryid AS old_id,
    new.queryid AS new_id,
    old.query_preview
FROM queryid_snapshot old
JOIN pg_stat_statements new
    ON old.query_preview = left(new.query, 100)
    AND old.dbid = new.dbid
    AND old.userid = new.userid
    AND old.queryid != new.queryid;

Find queries that share the same text but different queryids (alias or search_path issues):

SELECT
    left(query, 60) AS query_prefix,
    array_agg(DISTINCT queryid) AS queryids,
    count(DISTINCT queryid) AS id_count
FROM pg_stat_statements
GROUP BY left(query, 60)
HAVING count(DISTINCT queryid) > 1
ORDER BY id_count DESC;

Check if compute_query_id is enabled:

SELECT name, setting, source
FROM pg_settings
WHERE name = 'compute_query_id';

Monitor entry deallocation (when pg_stat_statements.max is too low):

SELECT
    dealloc,
    stats_reset
FROM pg_stat_statements_info;

When entries are deallocated due to the pg_stat_statements.max limit, the oldest entries are evicted. New queries that get the same slot may show with unnormalized constant values in the query text, which creates a different queryid than the normalized version.

Building Version-Stable Monitoring

If you’re building an observability pipeline on top of pg_stat_statements, here’s how to survive queryid instability:

Use (queryid, dbid, userid) as a composite key. Never use queryid alone — it’s only unique within a (userid, dbid) pair.

Fingerprint queries by normalized text, not queryid. Normalize the query text yourself (strip constants, collapse whitespace) and use that as your stable identifier. Use queryid for join efficiency within a single collection interval, but don’t rely on it for historical correlation across versions.

Snapshot before upgrades. Before a major version upgrade, dump your pg_stat_statements data with query text and queryid. After the upgrade, use the query text to build a mapping from old queryid to new queryid.

Tag your queries. Use SQL comments to embed stable identifiers:

/* app=api endpoint=users-list */
SELECT * FROM users WHERE active = true LIMIT $1 OFFSET $2;

The comment survives normalization and gives you a stable tag that’s independent of queryid.

Set pg_stat_statements.max generously. The default is 5000. If your application generates many distinct query patterns, you’ll hit entry deallocation, which causes statistics loss and can change which queries appear in the view. A setting of 10000-50000 is more appropriate for production monitoring.

Key Takeaways

queryid is a hash of the post-parse-analysis tree including internal OIDs — it’s not a hash of query text
It changes across major PostgreSQL versions, across machine architectures, when objects are dropped/recreated, and with different search_path settings
Two queries with identical text can have different queryid values if they resolve to different underlying objects
Hash collisions are possible (64-bit space) — unrelated queries can merge into one entry
PG 14 introduced compute_query_id as a core GUC, unifying queryid across the ecosystem
PG 16 added constant normalization for utility commands (DDL), reducing unnecessary queryid proliferation
For stable monitoring: fingerprint by normalized query text, use SQL comments as tags, and snapshot queryid mappings before upgrades
Physical replication preserves queryid; logical replication does not

Previously: Prepared Transactions, max_prepared_transactions, and the Two-Phase Trap