ozontech · vadv · Apr 21, 2026 · Apr 20, 2026 · Apr 21, 2026 · Apr 21, 2026
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/Cargo.toml b/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "pg_doorman"
-version = "3.5.1"
+version = "3.5.2"
 edition = "2021"
 rust-version = "1.87.0"
 license = "MIT"

diff --git a/documentation/en/src/changelog.md b/documentation/en/src/changelog.md
@@ -1,5 +1,35 @@
 # Changelog
 
+### 3.5.2 <small>Apr 21, 2026</small>
+
+#### Semaphore permit leak on direct handoff
+
+Each `return_object` handoff (delivering a connection to a waiting client via oneshot channel) permanently consumed one semaphore permit. After `max_size` handoffs the pool semaphore was fully drained, blocking all new `timeout_get` callers. The pool could not create connections and stabilized at whatever size it reached during cold start (typically 4-8 out of 40).
+
+Root cause: `wrap_checkout` calls `permit.forget()`, and the handoff path in `return_object` skipped `add_permits(1)`. Now `return_object` restores the permit on both the handoff and idle-queue paths. Compensating `add_permits(1)` in `pre_replace_one` removed (no longer needed).
+
+#### Burst gate select race
+
+The `tokio::select!` in the burst gate loop randomly picked among ready branches. When `sleep(5ms)` or `create_done` won over an already-delivered oneshot, the connection was silently dropped, inflating `slots.size` without a live server. Fixed with `biased;` (oneshot checked first) and a `try_recv` drain that pushes orphaned connections to idle without double-counting the permit.
+
+#### Migration fixes
+
+- **Client ID collision after migration.** The new process started its connection counter at 0, colliding with migrated client IDs. Now the counter advances past the highest migrated ID.
+
+- **SCRAM passthrough state preserved.** The ClientKey from the first client's SCRAM handshake is serialized in the migration payload (v2 format, backward compatible). The new process skips the `ScramPending` fallback to `server_password`.
+
+#### Session mode statistics fix
+
+`xact_time` percentiles in session mode showed the entire session duration instead of individual transaction time. Now recorded per-transaction at each `ReadyForQuery(Idle)`, matching transaction mode semantics.
+
+#### Adaptive anticipation budget
+
+Anticipation wait (formerly fixed 300-500ms) scales with real transaction latency: `xact_p99 * 2 +/- 20%` jitter, clamped to [5ms, 500ms]. Cold start default: 100ms.
+
+#### Diagnostic logging
+
+Slow checkout warnings (>500ms) now include pool state: `size`, `avail`, `waiting`, `inflight`, `creates`, `gate_waits`, `antic_ok`, `antic_to`, `fallback`. Phase-specific warnings added for semaphore timeout, burst gate timeout, coordinator exhaustion, and create failure.
+
 ### 3.5.1 <small>Apr 20, 2026</small>
 
 #### systemd Type=notify support

diff --git a/documentation/en/src/tutorials/pool-pressure.md b/documentation/en/src/tutorials/pool-pressure.md
@@ -163,14 +163,21 @@ dropped. `return_object` detects the dropped receiver (`send` returns
 This way timed-out waiters are cleaned up lazily without a separate
 garbage-collection pass.
 
-The deadline is `min(query_wait_timeout - 500 ms, PHASE_4_HARD_CAP)`
-where `PHASE_4_HARD_CAP` is randomly chosen between **300 ms and
-500 ms** (uniform jitter) for each checkout attempt, measured against
-a timestamp captured at the top of `timeout_get`. Phase 1/2 semaphore
-wait consumes from the same budget, so the cumulative wait across
-phases cannot exceed the caller's `query_wait_timeout`.
-
-The jitter prevents a **timeout cliff**: without it, N clients that
+The deadline is adaptive: `min(query_wait_timeout - 500 ms, adaptive_cap)`
+where `adaptive_cap` is derived from real transaction latency:
+
+| Pool state | Budget | Example |
+|------------|--------|---------|
+| Cold start (no stats) | 100ms ± 20% jitter | 80-120ms |
+| Steady state | xact_p99 × 2 ± 20% jitter | p99=0.7ms → 5ms (min); p99=50ms → 100ms |
+| High latency | Capped at 500ms | p99=300ms → 500ms |
+
+The budget is measured against a timestamp captured at the top of
+`timeout_get`. Phase 1/2 semaphore wait consumes from the same budget,
+so the cumulative wait across phases cannot exceed the caller's
+`query_wait_timeout`.
+
+The ±20% jitter prevents a **timeout cliff**: without it, N clients that
 entered Phase 4 at the same instant all exit simultaneously and
 stampede into the burst gate, creating N new backend connections for a
 pool that needs far fewer. With jitter, clients exit in staggered
@@ -183,9 +190,11 @@ to the idle queue for recycling.
 connects. If a slot is free, take it and call `connect()` against
 PostgreSQL. If all slots are full, register a direct-handoff oneshot
 waiter and also listen for `create_done` (another in-flight create
-finishing). If a connection arrives via the oneshot channel, recycle it
-and return. Otherwise, re-try the recycle and the gate after the wake.
-A 5 ms backoff acts as a safety net if both wake sources are missed.
+finishing). The `select!` uses `biased;` to always check the oneshot
+first, preventing a race where `create_done` or the 5 ms backoff timer
+wins and silently drops the delivered connection. If a connection
+arrives via the oneshot channel, recycle it and return. Otherwise,
+re-try the recycle and the gate after the wake.
 
 **Phase 6 — Backend connect.** Run `connect()`, authenticate, hand the
 connection to the client. The burst slot is released automatically when

diff --git a/documentation/ru/pool-pressure.md b/documentation/ru/pool-pressure.md
@@ -185,16 +185,22 @@ recycle 10 раз в плотном цикле `yield_now` (контролиру
 Таким образом протухшие waiter'ы вычищаются лениво, без отдельного
 прохода сборки мусора.
 
-Дедлайн равен `min(query_wait_timeout - 500 ms, PHASE_4_HARD_CAP)`,
-где `PHASE_4_HARD_CAP` выбирается случайно в диапазоне **300–500 ms**
-(равномерный jitter) для каждой попытки checkout'а. Фаза 1/2 ожидания
-семафора тратит из того же бюджета, поэтому суммарное ожидание не
-может увести клиента за его `query_wait_timeout`.
-
-Jitter предотвращает **timeout cliff**: без него N клиентов, вошедших
-в Phase 4 одновременно, выходят одновременно и лавиной входят в burst
-gate, создавая N новых backend-соединений для пула, которому нужно
-значительно меньше. С jitter'ом клиенты выходят порциями — первые
+Дедлайн адаптивный: `min(query_wait_timeout - 500 ms, adaptive_cap)`,
+где `adaptive_cap` вычисляется из реальной latency транзакций:
+
+| Состояние пула | Бюджет | Пример |
+|---------------|--------|--------|
+| Холодный старт (нет данных) | 100ms ± 20% jitter | 80-120ms |
+| Steady state | xact_p99 × 2 ± 20% jitter | p99=0.7ms → 5ms (min); p99=50ms → 100ms |
+| Высокая latency | Ограничено 500ms | p99=300ms → 500ms |
+
+Фаза 1/2 ожидания семафора тратит из того же бюджета, поэтому
+суммарное ожидание не может увести клиента за его `query_wait_timeout`.
+
+Jitter ±20% предотвращает **timeout cliff**: без него N клиентов,
+вошедших в Phase 4 одновременно, выходят одновременно и лавиной входят
+в burst gate, создавая N новых backend-соединений для пула, которому
+нужно значительно меньше. С jitter'ом клиенты выходят порциями — первые
 создают соединения, и к моменту выхода последних эти соединения уже
 использованы и возвращены в idle queue для переиспользования.
 
@@ -207,12 +213,22 @@ gate, создавая N новых backend-соединений для пула
 умолчанию 2) задач. Если слот свободен, задача забирает его и идёт
 вызывать `connect()`. Если все слоты заняты, задача регистрирует
 oneshot-waiter для direct handoff и слушает `create_done` (завершение
-чужого create). Если соединение приходит через oneshot-канал, оно
-recycl'ится и возвращается. Иначе задача снова пробует recycle и
-gate. Короткий `sleep(5 ms)` подстраховывает на случай, если оба
-источника пробуждения пропущены. Так лимитируется именно скорость
-рождения новых backend-соединений в одном пуле, а не размер самого
-пула.
+чужого create). `select!` использует `biased;`, чтобы oneshot
+проверялся первым — без этого `create_done` или таймер `sleep(5 ms)`
+могли выиграть гонку и потерять уже доставленное соединение. Если
+соединение приходит через oneshot-канал, оно recycl'ится и
+возвращается. Иначе задача снова пробует recycle и gate. Так
+лимитируется скорость рождения новых backend-соединений в одном пуле,
+а не размер самого пула.
+
+**Adaptive timeout burst gate.** Цикл burst gate ограничен адаптивным
+бюджетом: `xact_p99 × 2 ± 20%` jitter (min 20 ms, max 500 ms). Если
+задача провела в цикле дольше бюджета, она прекращает регистрироваться
+как waiter для handoff и переходит к захвату слота burst gate напрямую.
+Без этого механизма пул мог застрять на `warm_threshold` навсегда:
+клиенты бесконечно получали recycled connections через shared handoff
+queue, а до `try_acquire` (создание нового соединения) дело не доходило.
+Счётчик `burst_gate_budget_exhausted` отслеживает срабатывания.
 
 **Фаза 6 — backend connect.** Запускаем `connect()`, аутентифицируемся,
 отдаём соединение клиенту. Burst slot освобождается автоматически по
@@ -1025,6 +1041,7 @@ eviction ротирует соединения между пользовател
 | `pg_doorman_pool_scaling{type="inflight_creates"}` | gauge | `user`, `database` | `inflight` из `SHOW POOL_SCALING` |
 | `pg_doorman_pool_scaling_total{type="creates_started"}` | counter | `user`, `database` | `creates` |
 | `pg_doorman_pool_scaling_total{type="burst_gate_waits"}` | counter | `user`, `database` | `gate_waits` |
+| `pg_doorman_pool_scaling_total{type="burst_gate_budget_exhausted"}` | counter | `user`, `database` | `gate_budget_ex` — adaptive timeout сработал, клиент перешёл к созданию |
 | `pg_doorman_pool_scaling_total{type="anticipation_wakes_notify"}` | counter | `user`, `database` | `antic_notify` |
 | `pg_doorman_pool_scaling_total{type="anticipation_wakes_timeout"}` | counter | `user`, `database` | `antic_timeout` |
 | `pg_doorman_pool_scaling_total{type="create_fallback"}` | counter | `user`, `database` | `create_fallback` |
@@ -1299,7 +1316,7 @@ psql -h 127.0.0.1 -p 6432 -U admin pgdoorman \
 ```
 
 Позиции полей в `awk` соответствуют порядку колонок, описанному выше:
-для `POOL_SCALING` это `user|database|inflight|creates|gate_waits|antic_notify|antic_timeout|create_fallback|replenish_def`,
+для `POOL_SCALING` это `user|database|inflight|creates|gate_waits|gate_budget_ex|antic_notify|antic_timeout|create_fallback|replenish_def`,
 для `POOL_COORDINATOR` это `database|max_db_conn|current|reserve_size|reserve_used|evictions|reserve_acq|exhaustions`.
 
 ## Сравнение с PgBouncer

diff --git a/src/admin/show.rs b/src/admin/show.rs
@@ -653,6 +653,7 @@ where
         ("inflight", DataType::Numeric),
         ("creates", DataType::Numeric),
         ("gate_waits", DataType::Numeric),
+        ("gate_budget_ex", DataType::Numeric),
         ("antic_notify", DataType::Numeric),
         ("antic_timeout", DataType::Numeric),
         ("create_fallback", DataType::Numeric),
@@ -675,6 +676,7 @@ where
             snapshot.inflight_creates.to_string(),
             snapshot.creates_started.to_string(),
             snapshot.burst_gate_waits.to_string(),
+            snapshot.burst_gate_budget_exhausted.to_string(),
             snapshot.anticipation_wakes_notify.to_string(),
             snapshot.anticipation_wakes_timeout.to_string(),
             snapshot.create_fallback.to_string(),

diff --git a/src/app/server.rs b/src/app/server.rs
@@ -815,7 +815,7 @@ async fn binary_upgrade_and_shutdown(
                             // restart the service when we exit.
                             if let Err(e) = sd_notify::notify(
                                 false,
-                                &[sd_notify::NotifyState::MainPid(child_pid.into())],
+                                &[sd_notify::NotifyState::MainPid(child_pid)],
                             ) {
                                 warn!("sd_notify MAINPID failed: {e}. systemd may restart the service after old process exits.");
                             }
@@ -834,8 +834,15 @@ async fn binary_upgrade_and_shutdown(
                             let _ = MIGRATION_TX.set(tx);
                             // MIGRATION_IN_PROGRESS already set above (before SHUTDOWN)
                             let (shutdown_tx, shutdown_rx) = tokio::sync::oneshot::channel();
-                            let sender_handle = tokio::spawn(migration_sender_task(migration_parent_fd, rx, shutdown_rx));
-                            migration_handles = Some(MigrationHandles { shutdown_tx, sender_handle });
+                            let sender_handle = tokio::spawn(migration_sender_task(
+                                migration_parent_fd,
+                                rx,
+                                shutdown_rx,
+                            ));
+                            migration_handles = Some(MigrationHandles {
+                                shutdown_tx,
+                                sender_handle,
+                            });
                             info!("Client migration enabled");
                         }
 

diff --git a/src/client/core.rs b/src/client/core.rs
@@ -358,6 +358,11 @@ pub struct Client<S, T> {
     /// Connected to server
     pub(crate) connected_to_server: bool,
 
+    /// Session mode: transaction start timestamp for per-transaction xact_time.
+    /// Set when server transitions into a transaction (ReadyForQuery 'T'/'E').
+    /// Consumed when transaction ends (ReadyForQuery 'I').
+    pub(crate) session_xact_start: Option<quanta::Instant>,
+
     /// Name of the server pool for this client (This comes from the database name in the connection string)
     pub(crate) pool_name: String,