fix: interrupt mode compatibility for custom reactors#70
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Improves SPDK interrupt-mode behavior for deployments using custom reactors that block in fd_group_wait(), preventing busy-poller eventfds from keeping the reactor hot and ensuring bdev examine completion.
Changes:
- Register an interrupt handler for the iSCSI poll group poller to suppress the always-readable busy-poller eventfd in interrupt mode.
- Change
spdk_bdev_wait_for_examine()from a busy poller (period=0) to a 1ms periodic poller to avoid spinningfd_group_wait()and to ensure examine completion.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| lib/iscsi/iscsi_subsystem.c | Registers interrupt for iSCSI poll group poller to avoid always-readable eventfd behavior in interrupt mode. |
| lib/bdev/bdev.c | Replaces busy poller with 1ms periodic poller and documents the interrupt-mode rationale. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tiagolobocastro
left a comment
There was a problem hiding this comment.
Looks like CI is failing, though not sure if it's related to these changes, could you please take a look?
I've made copilot review but it's been not too helpful
The iscsi_poll_group_poll busy poller (period=0) did not call spdk_poller_register_interrupt(), leaving its auto-created eventfd permanently triggered in the thread's fd_group. This prevents spdk_fd_group_wait() from blocking when the reactor is in interrupt mode, defeating the purpose of interrupt-driven operation. Add spdk_poller_register_interrupt(pg->poller, NULL, NULL) to clean up the default busy eventfd, matching the pattern already used by the NVMf transport, NVMf TCP acceptor, and NVMe bdev module. Ref: openebs/mayastor#1745 Signed-off-by: Jeremias Reith <jr42@users.noreply.github.com>
f5f5f55 to
e75bc17
Compare
Update SPDK revision to include iSCSI poll group interrupt registration and bdev wait_for_examine periodic poller fix, both required for interrupt mode with custom reactors. Pin targets jr42/spdk@pr/interrupt-mode-fixes (approved but not yet merged upstream). When openebs/spdk#70 lands on openebs/v25.05.x-mayastor, flip owner back to "openebs" and update rev/sha256 to the merged commit. Depends-On: openebs/spdk#70 Signed-off-by: Jeremias Reith <jr42@users.noreply.github.com>
Update spdk-rs submodule to include FdGroup wrapper and Thread interrupt mode API wrappers required by the reactor interrupt mode implementation. spdk-rs also bumps its libspdk nix pin to pull in the iSCSI poll group interrupt registration and bdev wait_for_examine periodic poller fixes on the SPDK side (openebs/spdk#70), required for interrupt mode with custom reactors. Depends-On: openebs/spdk-rs#105 Depends-On: openebs/spdk#70 Signed-off-by: Jeremias Reith <jr42@users.noreply.github.com>
e75bc17 to
6b99052
Compare
Update SPDK revision to include iSCSI poll group interrupt registration and bdev wait_for_examine periodic poller fix, both required for interrupt mode with custom reactors. Pin targets jr42/spdk@pr/interrupt-mode-fixes (approved but not yet merged upstream). When openebs/spdk#70 lands on openebs/v25.05.x-mayastor, flip owner back to "openebs" and update rev/sha256 to the merged commit. Depends-On: openebs/spdk#70 Signed-off-by: Jeremias Reith <jr42@users.noreply.github.com>
Update spdk-rs submodule to include FdGroup wrapper and Thread interrupt mode API wrappers required by the reactor interrupt mode implementation. spdk-rs also bumps its libspdk nix pin to pull in the iSCSI poll group interrupt registration and bdev wait_for_examine periodic poller fixes on the SPDK side (openebs/spdk#70), required for interrupt mode with custom reactors. Depends-On: openebs/spdk-rs#105 Depends-On: openebs/spdk#70 Signed-off-by: Jeremias Reith <jr42@users.noreply.github.com>
|
@jr42 could you take a look at the failure? |
In interrupt mode a busy poller (period=0) installed by spdk_bdev_wait_for_examine() gets an always-readable eventfd that spins fd_group_wait(). Clearing it via spdk_poller_register_interrupt(ctx->poller, NULL, NULL) leaves the poller with no interrupt source so it never fires and bdev examine never completes -- causing spdk_bdev_unregister() to stall in REMOVING state and destroy to hang. Fix by adding a fast path: when bdev_module_all_actions_completed() is already true at call time, defer the callback via a thread message instead of installing a poller. The message preserves the async contract (callback runs on the next thread poll, not during this call) and avoids the interrupt-mode spin entirely for the common synchronous-examine case. For genuinely asynchronous examine a 1 ms periodic poller is still used; its timerfd fires in both poll and interrupt modes. Validated against the mayastor pytest harness (publish, rebuild, replica, nexus) and bdev_ut/bdev_ut_mt/part_ut unit tests. Ref: openebs/mayastor#1745 Signed-off-by: Jeremias Reith <jr42@users.noreply.github.com>
6b99052 to
acbaa22
Compare
Update SPDK revision to include iSCSI poll group interrupt registration and bdev wait_for_examine periodic poller fix, both required for interrupt mode with custom reactors. Pin targets jr42/spdk@pr/interrupt-mode-fixes (approved but not yet merged upstream). When openebs/spdk#70 lands on openebs/v25.05.x-mayastor, flip owner back to "openebs" and update rev/sha256 to the merged commit. Depends-On: openebs/spdk#70 Signed-off-by: Jeremias Reith <jr42@users.noreply.github.com>
|
Confirmed, pushed in Went with a fast path in |
Update SPDK revision to include iSCSI poll group interrupt registration and bdev wait_for_examine periodic poller fix, both required for interrupt mode with custom reactors. Pin targets jr42/spdk@pr/interrupt-mode-fixes (approved but not yet merged upstream). When openebs/spdk#70 lands on openebs/v25.05.x-mayastor, flip owner back to "openebs" and update rev/sha256 to the merged commit. Depends-On: openebs/spdk#70 Signed-off-by: Jeremias Reith <jr42@users.noreply.github.com>
1966: feat(reactor): add SPDK interrupt mode support r=tiagolobocastro a=jr42 ## Summary Opt-in SPDK interrupt mode for io-engine: reactors sleep in `fd_group_wait()` instead of busy-polling, reducing CPU from ~1000m per core to <300m when idle. - Enable with `ENABLE_INTERRUPT_MODE=true` (default: off, backward compatible) - Follows Longhorn v2 hybrid pattern (LEP 2025-07-21): epoll for NVMe-oF TCP targets, timerfd polling for NVMe initiators - Includes pytest compose wiring for interrupt mode testing ### Implementation Reactor changes (`reactor.rs`, +271 lines): - New `ReactorState::Interrupt` with `fd_group_wait`-based event loop - Reactor-level `FdGroup` nests all thread fd_groups for hierarchical mux - Wakeup eventfd (`FD_TYPE_EVENTFD`, auto-drained) for Rust future delivery - Cross-core wake on thread schedule to prevent multi-core init deadlock - Late fd_group nesting in `add_incoming()` for dynamic thread assignment - Clean shutdown path restoring poll mode ### Why interrupt mode instead of SPDK's dynamic scheduler (#1745) Mayastor implements its own reactor loop (`reactor.rs`), bypassing SPDK's stock reactor entirely. SPDK's dynamic scheduler monitors thread busyness *within SPDK's reactor* -- since mayastor's reactor replaces it, the scheduler has nothing to observe or control. Instead, we implement interrupt mode directly in the custom reactor, following the same pattern validated by Longhorn v2 (LEP 2025-07-21). This is simpler, more predictable, and doesn't require restructuring the reactor to use SPDK's scheduler infrastructure. The dynamic scheduler remains a future option if the reactor is ever migrated closer to SPDK's stock implementation. Depends-On: openebs/spdk-rs#105 Depends-On: openebs/spdk#70 Closes: #1745 ## Test plan - [x] 73/86 pytest tests pass in single-core interrupt mode (13 failures are env-related, identical in poll mode) - [x] Multi-core validated (2-core smoke test) - [x] Production: 3-node cluster, 16 volumes, CPU ~3000m to ~463m (85% reduction) - [x] Rolling restart validated: volumes auto-recover, nexuses redistribute - [ ] CI pipeline passes Co-authored-by: Jeremias Reith <jr42@users.noreply.github.com>
Summary
Two fixes for SPDK interrupt mode compatibility with custom reactors that
block in
fd_group_wait()instead of using SPDK's stock reactor loop:that prevents
fd_group_wait()from blocking. Follows existing pattern inNVMf TCP, NVMe bdev, and AIO bdev modules.
poller. A busy poller's eventfd spins
fd_group_wait(); suppressing it viaspdk_poller_register_interrupt(NULL, NULL)removes all interrupt sources sothe poller never fires — bdev examine never completes and
spdk_bdev_unregister()returns EBUSY, hanging destroy operations.Both issues are specific to reactors that block in
fd_group_wait()ratherthan using SPDK's stock reactor loop (which services busy pollers regardless
of interrupt-mode state). SPDK's stock reactor and Longhorn v2 are unaffected.
Ref: openebs/mayastor#1745
Test plan