PostgreSQL Deep Dive: Synchronous Commit and Durability Trade-offs
Your application inserts an order. PostgreSQL returns “success.” The application tells the customer “your order is confirmed.” Two seconds later, the database server crashes. The order is gone. The customer got a confirmation email for an order that never existed.
This is the durability problem. PostgreSQL’s default behavior guarantees this can’t happen — but that guarantee comes at a cost: every COMMIT waits for a disk flush. On a busy system with a high-commit rate, that flush can be the bottleneck. The synchronous_commit setting is your knob for trading durability against latency, and understanding exactly what each level does (and doesn’t do) is critical for making the right choice.
What Happens During COMMIT
When a transaction commits, PostgreSQL writes the commit record to the WAL (Write-Ahead Log) buffer in shared memory. At this point the data is in RAM only — a server crash would lose it. To make the commit durable, the WAL must be flushed to disk.
The flush path is:
- WAL record written to WAL buffer (RAM)
- WAL buffer flushed to OS page cache (still RAM, but in kernel buffers)
- OS page cache flushed to disk (actual persistent storage — the
fsynccall)
Step 3 is the expensive one. A typical SSD can handle ~10,000-100,000 fsync calls per second. A high-write application doing 50,000 small transactions per second will bottleneck on WAL flushes. Not on CPU, not on network, not on query planning — on waiting for the disk to confirm the write.
The synchronous_commit Levels
PostgreSQL provides five levels of durability for COMMIT, controlled by the synchronous_commit parameter:
off (Fastest, Least Durable)
SET synchronous_commit = off;
The server returns “success” to the client as soon as the commit record is written to the WAL buffer in shared memory. It does NOT wait for any disk flush — not locally, not to any standby.
Risk: If the PostgreSQL server crashes (power failure, kernel panic, hardware failure), any transactions acknowledged to the client but not yet flushed to disk are lost. The maximum window of loss is approximately 3 × wal_writer_delay (default 3 × 200ms = 600ms).
Important: This does NOT cause database corruption. The database recovers to a consistent state — it just loses the most recent committed transactions. The effect is identical to those transactions having been aborted. MVCC ensures partial transactions are never visible.
Use when: Bulk data loading, ephemeral data (session state, caches), analytics staging tables, write-heavy workloads where occasional data loss is acceptable.
local (Default with no standbys)
SET synchronous_commit = local;
Waits for the commit record to be flushed to local disk before returning “success” to the client. Does NOT wait for any standby replication.
This is the same as on when synchronous_standby_names is empty (no synchronous standbys configured).
on (Default)
SET synchronous_commit = on;
Waits for the commit record to be flushed to local disk AND for the synchronous standby(s) to confirm they’ve received the commit record and flushed it to their disk.
This is the strongest durability guarantee when synchronous replication is configured. The transaction survives unless both the primary and all synchronous standbys suffer simultaneous disk corruption.
remote_write (Middle Ground)
SET synchronous_commit = remote_write;
Waits for the commit record to be flushed to local disk AND for the synchronous standby(s) to confirm they’ve written the commit record to their OS file system (page cache). Does NOT wait for the standby to fsync to disk.
Risk: If the standby’s operating system crashes (kernel panic, power loss to the standby server), the standby may lose the commit record. If the primary then fails and the standby is promoted, the transaction is lost.
Survives: PostgreSQL process crash on the standby (data is in OS page cache, which survives process-level crashes). Does NOT survive OS-level crash on the standby.
remote_apply (Strongest)
SET synchronous_commit = remote_apply;
Waits for the commit record to be flushed to local disk AND for the synchronous standby(s) to confirm they’ve received, flushed, AND replayed (applied) the commit. The transaction is visible to queries on the standby before the COMMIT returns to the client.
This is the slowest option but provides the strongest consistency guarantee: after commit, the data is visible on both the primary and the standby. Useful for read-your-writes consistency when reads are directed to the standby.
Performance Impact: The Numbers
The latency difference between levels is significant:
| Setting | Typical COMMIT latency | Risk window |
|---|---|---|
off | 0.1-0.5 ms | Up to 600ms of committed txns |
local | 1-5 ms | None locally |
remote_write | 2-10 ms | Standby OS crash |
on | 5-20 ms | Primary + all sync standbys |
remote_apply | 10-50 ms | Primary + all sync standbys |
These numbers vary enormously based on disk I/O speed, network latency to the standby, and standby replay speed. A fast NVMe primary with a standby over a 0.5ms LAN link will be at the lower end. A spinning disk primary with a standby over a WAN link will be at the upper end.
The key insight: the difference between off and on is typically 5-20× in COMMIT latency. For a workload doing 100,000 commits per second, that’s the difference between 10 seconds of total commit wait time and 100+ seconds.
Per-Transaction Control
synchronous_commit can be set at multiple levels:
-- Per transaction (most common approach)
BEGIN;
INSERT INTO orders ...;
-- This transaction is less critical, skip sync commit
SET LOCAL synchronous_commit = off;
COMMIT;
-- Per session
SET synchronous_commit = local;
-- Per database
ALTER DATABASE analytics SET synchronous_commit = off;
-- Per function
CREATE FUNCTION log_event(msg text)
RETURNS void AS $$
BEGIN
SET LOCAL synchronous_commit = off;
INSERT INTO event_log (message) VALUES (msg);
END;
$$ LANGUAGE plpgsql;
The SET LOCAL form is the most useful: it applies only to the current transaction and automatically resets afterward. This lets you have most transactions fully durable while choosing lower durability for specific workloads.
Common pattern: Use synchronous_commit = off for:
-- Session state / activity tracking
SET LOCAL synchronous_commit = off;
INSERT INTO user_activity (user_id, action, ts) VALUES (...);
-- Analytics event logging
SET LOCAL synchronous_commit = off;
INSERT INTO page_views (url, user_id, ts) VALUES (...);
-- Notification queues (fire-and-forget)
SET LOCAL synchronous_commit = off;
INSERT INTO notification_queue (user_id, type) VALUES (...);
synchronous_standby_names: Choosing Your Sync Partners
synchronous_commit levels beyond local only matter when synchronous_standby_names is configured. This parameter specifies which standby servers participate in synchronous replication.
First-N Syntax (Priority-Based)
-- In postgresql.conf on the primary
synchronous_standby_names = 'FIRST 1 (standby_1, standby_2)'
Transaction commits wait for the first available standby in priority order. If standby_1 is connected and streaming, it’s the synchronous standby. If it disconnects, standby_2 takes over immediately.
This is the most common configuration. You designate one primary synchronous standby and one or more failover candidates.
ANY-N Syntax (Quorum-Based)
synchronous_standby_names = 'ANY 1 (standby_1, standby_2, standby_3)'
Transaction commits wait for any N of the listed standbys to confirm. No priority ordering — any standby can fulfill the requirement.
This is useful for distributed setups where you want to survive the loss of specific standbys without explicit priority ordering. With ANY 2 (s1, s2, s3), commits wait for any two of three standbys to confirm.
Naming Convention
The standby name is the application_name from the standby’s primary_conninfo:
-- On the standby, in postgresql.conf or primary_conninfo:
primary_conninfo = 'host=primary port=5432 user=replicator application_name=standby_1'
Verify with pg_stat_replication:
SELECT
application_name,
state,
sync_state,
sync_priority,
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn
FROM pg_stat_replication;
Key columns:
sync_state:sync(currently synchronous),potential(next in line),async(not in synchronous_standby_names)sent_lsn: Last WAL position sent to this standbywrite_lsn: Last WAL position written to standby’s OS cache (remote_writewaits for this)flush_lsn: Last WAL position flushed to standby’s disk (onwaits for this)replay_lsn: Last WAL position replayed on standby (remote_applywaits for this)
The Gotcha: synchronous_commit = off Is Not fsync = off
A common and dangerous confusion is equating synchronous_commit = off with fsync = off. They are fundamentally different:
synchronous_commit = off: PostgreSQL still flushes WAL to disk eventually (via the WAL writer background process). The only risk is losing the most recent committed transactions during a crash. No corruption.fsync = off: PostgreSQL does not flush WAL to disk at all. A crash can leave the data files and WAL in an inconsistent state. This can cause silent data corruption that is unrecoverable.
Never set fsync = off on any system where you care about your data. It exists for benchmarking only. synchronous_commit = off is the safe way to get similar performance benefits with an understood, bounded risk.
Group Commit: Why Multiple Transactions Pay One Flush
PostgreSQL’s group commit mechanism amortizes the cost of fsync across multiple concurrent transactions. When several transactions commit simultaneously:
- All commit records are written to the WAL buffer
- A single
fsyncflushes all of them to disk - All waiting clients are notified simultaneously
This means the effective cost per COMMIT is lower than the raw fsync latency. With 10 concurrent transactions each doing synchronous_commit = on, you might get 10 commits for the price of one fsync. The more concurrent commits, the better the amortization.
group_commit works across all synchronous_commit levels. With off, the WAL writer batches writes similarly — multiple commit records get flushed in one I/O operation.
Diagnostic Queries
1. Check current settings
SHOW synchronous_commit;
SHOW synchronous_standby_names;
2. Monitor replication sync state
SELECT
application_name,
state,
sync_state,
sync_priority,
pg_wal_lsn_diff(sent_lsn, replay_lsn) / 1024 / 1024 AS lag_mb,
round(
EXTRACT(EPOCH FROM (now() - reply_time))::numeric, 2
) AS reply_lag_seconds
FROM pg_stat_replication
ORDER BY sync_priority;
3. Check for commit wait events
On PG16+, you can see if transactions are waiting on WAL flush:
SELECT
pid,
wait_event_type,
wait_event,
query
FROM pg_stat_activity
WHERE wait_event IN ('WalWrite', 'SyncRep');
WalWrite means waiting for local WAL flush. SyncRep means waiting for synchronous standby confirmation. If you see many SyncRep waits, your standby network or replay speed is the bottleneck.
4. Measure COMMIT latency over time
SELECT
datname,
xact_commit,
xact_rollback,
round(
blk_write_time::numeric / NULLIF(xact_commit, 0), 3
) AS avg_write_time_ms
FROM pg_stat_database
WHERE datname NOT IN ('template0', 'template1')
ORDER BY avg_write_time_ms DESC;
Practical Configuration Examples
High-Performance Web Application
# postgresql.conf
# Default durability for most transactions
synchronous_commit = on
# One synchronous standby for HA
synchronous_standby_names = 'FIRST 1 (standby_1, standby_2)'
-- Application code: full durability for orders and payments
BEGIN;
INSERT INTO orders ...;
COMMIT; -- uses synchronous_commit = on (default)
-- But skip sync for analytics events
BEGIN;
SET LOCAL synchronous_commit = off;
INSERT INTO analytics_events ...;
COMMIT;
Analytics Pipeline
# postgresql.conf
# No synchronous standbys needed for analytics
synchronous_standby_names = ''
# Don't wait for local flush on bulk loads
synchronous_commit = off
# Or use local if you want crash safety without replication
# synchronous_commit = local
Financial System (Maximum Durability)
-- Application ensures data is visible on standby before confirming
BEGIN;
SET LOCAL synchronous_commit = remote_apply;
INSERT INTO transactions ...;
COMMIT;
Key Takeaways
synchronous_commithas five levels:off(no wait),local(local flush only),remote_write(local flush + standby OS write),on(local flush + standby disk flush),remote_apply(local flush + standby replay).offdoes NOT cause corruption — the worst case is losing the most recent ~600ms of committed transactions on crash. The database recovers to a consistent state.remote_writeandremote_applyonly have effect whensynchronous_standby_namesis configured. Without synchronous standbys,local,remote_write, andonall behave identically.- The performance difference between
offandonis typically 5-20× in COMMIT latency. For high-commit-rate workloads,SET LOCAL synchronous_commit = offon non-critical transactions is the easiest performance win. synchronous_commit = offis NOT the same asfsync = off. The former risks losing recent transactions. The latter risks silent data corruption. Never usefsync = offin production.- Group commit amortizes
fsynccost across concurrent transactions, reducing the per-COMMIT overhead. - Use
pg_stat_replicationto monitor standby sync states and identify whether synchronous replication is causing commit delays.
What’s Next
Tomorrow we’ll look at CTE materialization — why Common Table Expressions can be optimization fences, when PostgreSQL materializes them vs inlining them, and how a simple query restructuring can unlock dramatic performance improvements.
Previous: BRIN Indexes — How a 1 MB Index Can Replace a 1 GB B-Tree