Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "pg_doorman"
version = "3.5.1"
version = "3.5.2"
edition = "2021"
rust-version = "1.87.0"
license = "MIT"
Expand Down
32 changes: 32 additions & 0 deletions documentation/en/src/changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,37 @@
# Changelog

### 3.5.2 <small>Apr 21, 2026</small>

#### Semaphore permit leak on direct handoff

Each `return_object` handoff (delivering a connection to a waiting client via oneshot channel) permanently consumed one semaphore permit. After `max_size` handoffs the pool semaphore was fully drained, blocking all new `timeout_get` callers. The pool could not create connections and stabilized at whatever size it reached during cold start (typically 4-8 out of 40).

Root cause: `wrap_checkout` calls `permit.forget()`, and the handoff path in `return_object` skipped `add_permits(1)`. Now `return_object` restores the permit on both the handoff and idle-queue paths. Compensating `add_permits(1)` in `pre_replace_one` removed (no longer needed).

#### Burst gate select race

The `tokio::select!` in the burst gate loop randomly picked among ready branches. When `sleep(5ms)` or `create_done` won over an already-delivered oneshot, the connection was silently dropped, inflating `slots.size` without a live server. Fixed with `biased;` (oneshot checked first) and a `try_recv` drain that pushes orphaned connections to idle without double-counting the permit.

#### Migration fixes

- **Client ID collision after migration.** The new process started its connection counter at 0, colliding with migrated client IDs. Now the counter advances past the highest migrated ID.

- **SCRAM passthrough state preserved.** The ClientKey from the first client's SCRAM handshake is serialized in the migration payload (v2 format, backward compatible). The new process skips the `ScramPending` fallback to `server_password`.

#### Session mode statistics fix

`xact_time` percentiles in session mode showed the entire session duration instead of individual transaction time. Now recorded per-transaction at each `ReadyForQuery(Idle)`, matching transaction mode semantics.

`query_time` had the same accumulation bug: the timer was set once before the inner loop and never reset, so each subsequent query reported the cumulative session duration. Now reset per-query in session mode.

#### Adaptive anticipation budget

Anticipation wait (formerly fixed 300-500ms) scales with real transaction latency: `xact_p99 * 2 +/- 20%` jitter, clamped to [5ms, 500ms]. Cold start default: 100ms.

#### Diagnostic logging

Slow checkout warnings (>500ms) now include pool state: `size`, `avail`, `waiting`, `inflight`, `creates`, `gate_waits`, `antic_ok`, `antic_to`, `fallback`. Phase-specific warnings added for semaphore timeout, burst gate timeout, coordinator exhaustion, and create failure.

### 3.5.1 <small>Apr 20, 2026</small>

#### systemd Type=notify support
Expand Down
31 changes: 20 additions & 11 deletions documentation/en/src/tutorials/pool-pressure.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,14 +163,21 @@ dropped. `return_object` detects the dropped receiver (`send` returns
This way timed-out waiters are cleaned up lazily without a separate
garbage-collection pass.

The deadline is `min(query_wait_timeout - 500 ms, PHASE_4_HARD_CAP)`
where `PHASE_4_HARD_CAP` is randomly chosen between **300 ms and
500 ms** (uniform jitter) for each checkout attempt, measured against
a timestamp captured at the top of `timeout_get`. Phase 1/2 semaphore
wait consumes from the same budget, so the cumulative wait across
phases cannot exceed the caller's `query_wait_timeout`.

The jitter prevents a **timeout cliff**: without it, N clients that
The deadline is adaptive: `min(query_wait_timeout - 500 ms, adaptive_cap)`
where `adaptive_cap` is derived from real transaction latency:

| Pool state | Budget | Example |
|------------|--------|---------|
| Cold start (no stats) | 100ms ± 20% jitter | 80-120ms |
| Steady state | xact_p99 × 2 ± 20% jitter | p99=0.7ms → 5ms (min); p99=50ms → 100ms |
| High latency | Capped at 500ms | p99=300ms → 500ms |

The budget is measured against a timestamp captured at the top of
`timeout_get`. Phase 1/2 semaphore wait consumes from the same budget,
so the cumulative wait across phases cannot exceed the caller's
`query_wait_timeout`.

The ±20% jitter prevents a **timeout cliff**: without it, N clients that
entered Phase 4 at the same instant all exit simultaneously and
stampede into the burst gate, creating N new backend connections for a
pool that needs far fewer. With jitter, clients exit in staggered
Expand All @@ -183,9 +190,11 @@ to the idle queue for recycling.
connects. If a slot is free, take it and call `connect()` against
PostgreSQL. If all slots are full, register a direct-handoff oneshot
waiter and also listen for `create_done` (another in-flight create
finishing). If a connection arrives via the oneshot channel, recycle it
and return. Otherwise, re-try the recycle and the gate after the wake.
A 5 ms backoff acts as a safety net if both wake sources are missed.
finishing). The `select!` uses `biased;` to always check the oneshot
first, preventing a race where `create_done` or the 5 ms backoff timer
wins and silently drops the delivered connection. If a connection
arrives via the oneshot channel, recycle it and return. Otherwise,
re-try the recycle and the gate after the wake.

**Phase 6 — Backend connect.** Run `connect()`, authenticate, hand the
connection to the client. The burst slot is released automatically when
Expand Down
Loading
Loading