PostgreSQL Deep Dive: Idle in Transaction Connections — The Silent Killer
Your database is slow. You check pg_stat_activity and see a wall of idle in transaction connections — sessions that ran a query, got their result, and then… sat there. The application moved on. The database didn’t.
These connections are the quietest disaster in PostgreSQL operations. They don’t show up in slow query logs. They don’t trigger alerts. But they hold locks, block vacuum, inflate your tables, and create cascading latency across every other session on the system.
Let’s break down exactly what happens when a connection goes idle inside a transaction, why the damage is so disproportionate, and what you can do about it.
What “idle in transaction” Actually Means
PostgreSQL tracks backend state in pg_stat_activity. The state column shows one of these values:
active -- currently executing a query
idle -- waiting for the next client command (no transaction)
idle in transaction -- in a transaction, but not executing anything
idle in transaction (aborted) -- same, but a previous statement errored
fastpath function call -- executing a fast-path function
The dangerous ones are idle in transaction and idle in transaction (aborted). Both mean the session has an open transaction block with no active query running. The application fetched its data and stopped sending commands — but never sent COMMIT or ROLLBACK.
The (aborted) variant is worse in one specific way: the session hit an error, PostgreSQL set the transaction to abort-only mode, but the application never rolled back. The connection sits there uselessly, still holding every resource the transaction acquired before the error.
Why It’s Dangerous: Three Cascading Problems
1. Lock Retention
Every lock acquired during the transaction stays held. That SELECT ... FOR UPDATE you ran three hours ago? Still locked. The ACCESS SHARE lock from a SELECT query? Still preventing ALTER TABLE and VACUUM FULL from acquiring their ACCESS EXCLUSIVE lock.
This creates invisible lock queues. Other sessions wait for locks that appear to be held by nobody — the holder is “idle” — and the wait builds up until something times out or someone pages you at 2 AM.
-- Find sessions blocking others
SELECT
blocked.pid AS blocked_pid,
blocked.query AS blocked_query,
blocker.pid AS blocker_pid,
blocker.query AS blocker_query,
blocker.state AS blocker_state,
blocked.wait_event_type || '/' || blocked.wait_event AS wait_event
FROM pg_stat_activity blocked
JOIN pg_locks blocked_locks ON blocked.pid = blocked_locks.pid
JOIN pg_locks blocker_locks ON blocked_locks.locktype = blocker_locks.locktype
AND blocked_locks.database IS NOT DISTINCT FROM blocker_locks.database
AND blocked_locks.relation IS NOT DISTINCT FROM blocker_locks.relation
AND blocked_locks.page IS NOT DISTINCT FROM blocker_locks.page
AND blocked_locks.tuple IS NOT DISTINCT FROM blocker_locks.tuple
AND blocked_locks.classid IS NOT DISTINCT FROM blocker_locks.classid
AND blocked_locks.objid IS NOT DISTINCT FROM blocker_locks.objid
AND blocked_locks.objsubid IS NOT DISTINCT FROM blocker_locks.objsubid
AND blocked_locks.pid != blocker_locks.pid
JOIN pg_stat_activity blocker ON blocker_locks.pid = blocker.pid
WHERE NOT blocked_locks.granted;
2. Vacuum Blocking and Table Bloat
This is the insidious one. Vacuum cannot remove dead tuples that are visible to any running transaction. Every open transaction has an xmin horizon — the earliest transaction ID it might need to see. Any dead tuple with an xmax newer than the oldest open transaction’s xmin must be preserved.
An idle-in-transaction connection with an xmin from two hours ago means vacuum cannot reclaim any rows deleted in the last two hours on any table in the same database. Not just the tables the transaction touched. Any table.
The result: steady table bloat. Your tables grow, indexes grow, sequential scans get slower, and more pages need to be read from disk. The bloat feeds on itself — bigger tables mean slower queries, which mean longer transactions, which mean more bloat.
-- Find sessions with the oldest xmin horizons
SELECT
pid,
now() - xact_start AS transaction_duration,
state,
query,
backend_xmin
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY xact_start ASC
LIMIT 10;
If you see a session with a multi-hour transaction_duration and a state of idle in transaction, that’s your vacuum blocker.
3. Connection Exhaustion
PostgreSQL has a hard limit on connections (max_connections). Every idle-in-transaction session consumes one slot. Unlike genuinely idle connections (which can be recycled by a pooler), idle-in-transaction connections are sticky — the pooler can’t return them to the pool because the transaction state is session-local.
This is the scenario where you start getting FATAL: sorry, too many clients already errors even though your application isn’t doing anything unusual. The connections are just… stuck.
Where These Come From
The root cause is almost always in the application layer, not the database.
Missing commit/rollback in error paths. The most common pattern:
# Python example - the classic bug
conn = pool.getconn()
cur = conn.cursor()
cur.execute("BEGIN")
cur.execute("UPDATE accounts SET balance = balance - 100 WHERE id = 1")
# Some application code throws an exception here
# The exception is caught at a higher level
# But nobody rolls back the transaction
# Connection goes back to the pool inside a transaction
Connection pooler misconfiguration. PgBouncer in session pooling mode keeps the server connection tied to the client for the entire session. If the client forgets to commit, the server connection stays in transaction. In transaction pooling mode, PgBouncer manages transaction boundaries — but only if the client actually commits.
ORM lazy loading. An ORM opens a transaction at the start of a request, loads an entity, and then the request handler does something slow — calls an external API, processes a file, waits for a user response. The transaction sits open the entire time.
Interactive psql sessions. Someone opens psql, types BEGIN, runs a query, and then walks away to lunch. Three hours later, vacuum is completely blocked on that database. This happens more often than you’d think.
The Timeout Safety Nets
PostgreSQL provides several timeout parameters that act as safety nets.
idle_in_transaction_session_timeout
SET idle_in_transaction_session_timeout = '10min';
This terminates any session that has been idle inside a transaction for more than the specified duration. The session receives a FATAL: terminating connection due to idle-in-transaction timeout error and is disconnected.
This is the most targeted fix. It specifically targets the idle in transaction state without affecting long-running queries or genuinely idle connections. For most workloads, setting this to 10-30 minutes is a reasonable starting point.
-- Set globally (requires restart for some cases)
ALTER SYSTEM SET idle_in_transaction_session_timeout = '10min';
SELECT pg_reload_conf();
-- Or per-role for gradual rollout
ALTER ROLE app_user SET idle_in_transaction_session_timeout = '5min';
statement_timeout
SET statement_timeout = '30s';
Aborts any single statement that runs longer than the specified duration. This doesn’t directly address idle-in-transaction — a session that’s idle isn’t running a statement — but it prevents queries from running indefinitely and contributing to lock contention.
The problem: statement_timeout applies per-statement, not per-transaction. A transaction can run 100 one-second statements and sit idle between them, and statement_timeout won’t catch it.
transaction_timeout (PostgreSQL 17+)
SET transaction_timeout = '5min';
New in PostgreSQL 17, this terminates any session whose transaction has been open longer than the specified duration, regardless of whether it’s idle or active. This is the nuclear option — it catches idle transactions, long-running queries, and everything in between.
The interaction rules are important: if transaction_timeout is shorter than or equal to idle_in_transaction_session_timeout or statement_timeout, the longer timeout is effectively ignored. So if you set transaction_timeout = 5min and idle_in_transaction_session_timeout = 10min, the transaction timeout fires first and the idle-in-transaction timeout never triggers.
The PostgreSQL documentation explicitly warns against setting transaction_timeout in postgresql.conf because it affects all sessions — including maintenance operations, long-running analytical queries, and anything else that legitimately needs time. Use it per-session or per-role instead.
lock_timeout
SET lock_timeout = '5s';
Aborts any statement that waits longer than the specified time to acquire a lock. This doesn’t prevent idle-in-transaction sessions, but it prevents other sessions from blocking indefinitely on the locks those idle sessions hold. It’s a defensive measure, not a cure.
idle_session_timeout
SET idle_session_timeout = '60min';
Terminates sessions that are idle outside a transaction. This is the least important timeout — idle sessions without transactions impose minimal cost on the server. It’s mainly useful for cleaning up abandoned connections from poorly-behaved clients.
The documentation warns against using this with connection poolers, since poolers expect long-lived idle connections.
The Timeout Decision Matrix
Problem Fix
───────────────────────────── ──────────────────────────────────
App forgets to commit idle_in_transaction_session_timeout
Queries running too long statement_timeout
Whole tx too long (PG17+) transaction_timeout (per-role)
Waiting for locks too long lock_timeout
Abandoned idle connections idle_session_timeout (optional)
For most production systems, the combination is:
ALTER ROLE app_role SET idle_in_transaction_session_timeout = '10min';
ALTER ROLE app_role SET statement_timeout = '30s';
Add lock_timeout = '5s' if you want to be more aggressive about preventing lock queues.
Diagnostic Queries
Find all idle-in-transaction connections:
SELECT
pid,
usename,
application_name,
client_addr,
now() - xact_start AS transaction_age,
now() - state_change AS idle_duration,
state,
left(query, 100) AS last_query,
backend_xmin
FROM pg_stat_activity
WHERE state IN ('idle in transaction', 'idle in transaction (aborted)')
ORDER BY xact_start;
Measure vacuum blocking impact — how many dead tuples can’t be reclaimed:
SELECT
schemaname,
relname,
n_dead_tup,
last_autovacuum,
last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC
LIMIT 20;
Check if autovacuum is being blocked by old snapshots:
SELECT
pid,
now() - xact_start AS age,
state,
backend_xmin,
query
FROM pg_stat_activity
WHERE backend_xmin IS NOT NULL
ORDER BY xact_start
LIMIT 5;
Find the oldest transaction on the system:
SELECT
pid,
now() - xact_start AS transaction_duration,
state,
usename,
application_name,
left(query, 80) AS query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY xact_start
LIMIT 10;
The Application-Level Fix
Timeouts are safety nets. The real fix is in the application:
Always use transaction blocks with explicit scope.
# Good: explicit begin/commit with error handling
try:
with conn.transaction():
conn.execute("UPDATE accounts SET balance = balance - 100 WHERE id = 1")
conn.execute("UPDATE accounts SET balance = balance + 100 WHERE id = 2")
except Exception:
# Transaction automatically rolled back by context manager
logger.exception("Transfer failed")
Never hold transactions open across external calls.
# Bad: transaction held open during API call
with conn.transaction():
conn.execute("INSERT INTO orders (...) VALUES (...)")
order_id = conn.fetchone()['id']
# External API call inside transaction!
stripe.charge(customer_id, amount)
conn.execute("UPDATE orders SET charged = true WHERE id = %s", [order_id])
# Good: separate the transaction from the external call
conn.execute("INSERT INTO orders (...) VALUES (...) RETURNING id")
order_id = conn.fetchone()['id']
conn.commit() # Transaction done
# External call outside transaction
stripe_charge = stripe.charge(customer_id, amount)
# New transaction for the update
with conn.transaction():
conn.execute("UPDATE orders SET charged = true WHERE id = %s", [order_id])
Configure your pooler. If you’re using PgBouncer, transaction pooling mode (pool_mode = transaction) automatically disconnects server connections when a transaction completes, which prevents idle-in-transaction accumulation at the pooler level. But this comes with tradeoffs — session-level features like prepared statements, SET commands, and advisory locks don’t work across transactions.
The Subtle Interaction with Prepared Transactions
There’s a gap: idle_in_transaction_session_timeout and transaction_timeout do not apply to prepared transactions (those created with PREPARE TRANSACTION). Prepared transactions survive connection termination and even server restarts — they’re designed for two-phase commit. If your application uses two-phase commit and forgets to COMMIT PREPARED, those transactions will block vacuum forever.
Check for them with:
SELECT * FROM pg_prepared_xacts;
If max_prepared_transactions is zero (the default), this isn’t a concern — PREPARE TRANSACTION will error. But if it’s been enabled for any reason, this is a trap worth monitoring.
Key Takeaways
idle in transactionmeans a session has an open transaction but isn’t executing anything — the application forgot to commit or rollback- These sessions hold locks, block vacuum across the entire database, and consume connection slots
- The bloat is insidious: vacuum can’t reclaim dead tuples visible to any open transaction, not just the tables that transaction touched
- Set
idle_in_transaction_session_timeoutto 10-30 minutes as a safety net - The real fix is application discipline: never hold transactions open across external calls, always use scoped transaction blocks, and handle error paths with rollback
- In PostgreSQL 17+,
transaction_timeoutadds a harder cap on total transaction duration - Prepared transactions are exempt from these timeouts — monitor
pg_prepared_xactsseparately
Previously: tsvector, tsquery, and Why Your Full-Text Search Results Are Ranked Wrong