╔════════════════════════════════════════╗
║ POSTGRESQL WAIT EVENTS COVERAGE GAPS ║
╚════════════════════════════════════════╝
Analysis and improvement proposals for PostgreSQL wait event instrumentation
PostgreSQL wait events provide insight into what a backend process is doing at any given moment. When a process is actively waiting on a resource (I/O, lock, network), PostgreSQL records the specific wait event. This allows monitoring tools to show where time is being spent.
Official Documentation: pg_stat_activity — PostgreSQL Documentation
Many monitoring tools (AWS RDS Performance Insights, PASH Viewer, etc.) visualize wait_event IS NULL as green "CPU" time:
┌────────────────────────────────┐
│ Active Sessions (sampled/s) │
│ │
│ ████████████ <- CPU (green) │
│ ▓▓▓▓▓▓ <- I/O (blue) │
│ ░░░░ <- Lock (orange) │
│ │
│ Green = NULL wait_event │
└────────────────────────────────┘
But NULL doesn't always mean "CPU working" — it can also mean "not instrumented yet."
When a backend is stuck waiting on network I/O, external authentication, or other blocking operations without wait event coverage, monitoring tools incorrectly show it as green "CPU" activity.
The issue: NULL doesn't always mean "CPU working" — some wait events are uninstrumented.
When a backend is stuck waiting on network I/O, external authentication, or other blocking operations without wait event coverage, monitoring tools incorrectly show it as CPU activity.
Best practice: Tools should visualize NULL as "CPU or unknown wait event" — PostgresAI monitoring follows this convention.
This leads to misdiagnosis during incidents. A connection storm caused by slow LDAP authentication appears as "high CPU load" when it's actually network/auth waits that lack instrumentation.
Every authentication method has gaps that block logins without visibility:
log_hostname=onImpact: During auth storms, backends appear "busy with CPU" when they're actually waiting on external services.
unlink() for table files, fsync() for signal filesfstat() operations in parallel queryBase backup compression (gzip, LZ4, Zstandard) was considered for wait event coverage.
Community consensus: CPU-bound operations should NOT be covered by wait events. Per pgsql-hackers discussion: "I vehemently oppose turning wait events into a poor emulation of a CPU profiler."
Status: Rejected — wait events are for waiting, not CPU work.
SCRAM-SHA-256 (PBKDF2), SQL hash functions, HMAC operations.
Community consensus: CPU-bound cryptographic operations are legitimate CPU work, not waits. Adding wait events everywhere would add overhead without providing useful waiting-state information.
Status: Rejected — use profiling tools for CPU analysis, not wait events.
2026-02-03 — COPY FROM/TO pipe/file/program (part of this gap analysis project)
COPY_FROM_READ — "Waiting to read data from a pipe, a file or a program during COPY FROM"COPY_TO_WRITE — "Waiting to write data to a pipe, a file or a program during COPY TO"Commit: e05a24c
2025-12-09 — Group Commit Delay
COMMIT_DELAY — "Waiting for commit delay before WAL flush"Commit: 3cb5808b
2025-11-05 — LSN Waiting Infrastructure
WAIT_FOR_WAL_FLUSH — "Waiting for WAL flush to reach a target LSN on a primary"WAIT_FOR_WAL_REPLAY — "Waiting for WAL replay to reach a target LSN on a standby"WaitLSN — LWLock for Wait-for-LSN stateCommit: 3b4e53a0
2025-05-29 — Async I/O (io_uring)
AioUringCompletion — LWLock for AIO uring completionCommit: c3623703
2023-10-13 — Checkpoint Delay
CHECKPOINT_DELAY_COMPLETE — "Waiting for a backend that blocks a checkpoint from completing"CHECKPOINT_DELAY_START — "Waiting for a backend that blocks a checkpoint from starting"Commit: 0013ba29
An additional challenge: logging itself can cause observer effects that confuse wait event analysis.
Example: log_min_duration_statement=0 with high TPS on non-fast disks
When logging all statements (duration=0), backends spend time writing to disk for every single query. This I/O lacks a wait event, so it appears as NULL → "CPU" in monitoring tools.
The very act of observing (logging) changes what's being observed, and the gap in instrumentation makes it invisible.
This project is proposed for Google Summer of Code 2026:
→ Wait Event Coverage Improvements
Update: A potential candidate has already expressed interest — details TBD.
If you're a student interested in PostgreSQL internals and observability, this is a great opportunity to contribute to a critical area of database monitoring.