╔════════════════════════════════════════╗
║  POSTGRESQL WAIT EVENTS COVERAGE GAPS  ║
╚════════════════════════════════════════╝
        

Analysis and improvement proposals for PostgreSQL wait event instrumentation

────────────────────────────────────

What Are Wait Events?

PostgreSQL wait events provide insight into what a backend process is doing at any given moment. When a process is actively waiting on a resource (I/O, lock, network), PostgreSQL records the specific wait event. This allows monitoring tools to show where time is being spent.

Official Documentation: pg_stat_activity — PostgreSQL Documentation

The "Green CPU" Problem

Many monitoring tools (AWS RDS Performance Insights, PASH Viewer, etc.) visualize wait_event IS NULL as green "CPU" time:

┌────────────────────────────────┐
│ Active Sessions (sampled/s)    │
│                                │
│ ████████████ <- CPU (green)    │
│ ▓▓▓▓▓▓       <- I/O (blue)     │
│ ░░░░         <- Lock (orange)  │
│                                │
│ Green = NULL wait_event        │
└────────────────────────────────┘
    

But NULL doesn't always mean "CPU working" — it can also mean "not instrumented yet."

────────────────────────────────────

The Real Problem

When a backend is stuck waiting on network I/O, external authentication, or other blocking operations without wait event coverage, monitoring tools incorrectly show it as green "CPU" activity.

The issue: NULL doesn't always mean "CPU working" — some wait events are uninstrumented.

When a backend is stuck waiting on network I/O, external authentication, or other blocking operations without wait event coverage, monitoring tools incorrectly show it as CPU activity.

Best practice: Tools should visualize NULL as "CPU or unknown wait event" — PostgresAI monitoring follows this convention.

This leads to misdiagnosis during incidents. A connection storm caused by slow LDAP authentication appears as "high CPU load" when it's actually network/auth waits that lack instrumentation.

────────────────────────────────────

Gap Categories

32
Authentication
(Critical)
7
I/O Operations
(Required)

Authentication (Critical — 32 locations)

Every authentication method has gaps that block logins without visibility:

Impact: During auth storms, backends appear "busy with CPU" when they're actually waiting on external services.

I/O Operations (Required — 7 locations)

□ Compression ✗ NOT APPLICABLE

Base backup compression (gzip, LZ4, Zstandard) was considered for wait event coverage.

Community consensus: CPU-bound operations should NOT be covered by wait events. Per pgsql-hackers discussion: "I vehemently oppose turning wait events into a poor emulation of a CPU profiler."

Status: Rejected — wait events are for waiting, not CPU work.

□ Cryptography ✗ NOT APPLICABLE

SCRAM-SHA-256 (PBKDF2), SQL hash functions, HMAC operations.

Community consensus: CPU-bound cryptographic operations are legitimate CPU work, not waits. Adding wait events everywhere would add overhead without providing useful waiting-state information.

Status: Rejected — use profiling tools for CPU analysis, not wait events.

────────────────────────────────────

Recently Added Wait Events

2026-02-03 — COPY FROM/TO pipe/file/program (part of this gap analysis project)

Commit: e05a24c

2025-12-09 — Group Commit Delay

Commit: 3cb5808b

2025-11-05 — LSN Waiting Infrastructure

Commit: 3b4e53a0

2025-05-29 — Async I/O (io_uring)

Commit: c3623703

2024-01-02 — WAL Summarizer

Commit: 5c430f9d

2023-10-13 — Checkpoint Delay

Commit: 0013ba29

────────────────────────────────────

Observer Effect

An additional challenge: logging itself can cause observer effects that confuse wait event analysis.

Example: log_min_duration_statement=0 with high TPS on non-fast disks

When logging all statements (duration=0), backends spend time writing to disk for every single query. This I/O lacks a wait event, so it appears as NULL → "CPU" in monitoring tools.

The very act of observing (logging) changes what's being observed, and the gap in instrumentation makes it invisible.

────────────────────────────────────

GSoC 2026

This project is proposed for Google Summer of Code 2026:

Wait Event Coverage Improvements

Update: A potential candidate has already expressed interest — details TBD.

If you're a student interested in PostgreSQL internals and observability, this is a great opportunity to contribute to a critical area of database monitoring.

────────────────────────────────────

Resources