Skip to content

fix: interrupt mode compatibility for custom reactors#70

Merged
tiagolobocastro merged 2 commits intoopenebs:v25.05.x-mayastorfrom
jr42:pr/interrupt-mode-fixes
Apr 21, 2026
Merged

fix: interrupt mode compatibility for custom reactors#70
tiagolobocastro merged 2 commits intoopenebs:v25.05.x-mayastorfrom
jr42:pr/interrupt-mode-fixes

Conversation

@jr42
Copy link
Copy Markdown

@jr42 jr42 commented Apr 12, 2026

Summary

Two fixes for SPDK interrupt mode compatibility with custom reactors that
block in fd_group_wait() instead of using SPDK's stock reactor loop:

  • iSCSI poll group: Register interrupt to suppress always-readable eventfd
    that prevents fd_group_wait() from blocking. Follows existing pattern in
    NVMf TCP, NVMe bdev, and AIO bdev modules.
  • bdev wait_for_examine: Replace busy poller (period=0) with 1ms periodic
    poller. A busy poller's eventfd spins fd_group_wait(); suppressing it via
    spdk_poller_register_interrupt(NULL, NULL) removes all interrupt sources so
    the poller never fires — bdev examine never completes and
    spdk_bdev_unregister() returns EBUSY, hanging destroy operations.

Both issues are specific to reactors that block in fd_group_wait() rather
than using SPDK's stock reactor loop (which services busy pollers regardless
of interrupt-mode state). SPDK's stock reactor and Longhorn v2 are unaffected.

Ref: openebs/mayastor#1745

Test plan

  • 73/86 mayastor pytest tests pass with interrupt mode enabled
  • Validated on 3-node production cluster (16 volumes, 10+ days uptime)
  • No regressions in poll mode (all tests identical with and without fix)

@tiagolobocastro tiagolobocastro requested a review from Copilot April 14, 2026 15:49
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Improves SPDK interrupt-mode behavior for deployments using custom reactors that block in fd_group_wait(), preventing busy-poller eventfds from keeping the reactor hot and ensuring bdev examine completion.

Changes:

  • Register an interrupt handler for the iSCSI poll group poller to suppress the always-readable busy-poller eventfd in interrupt mode.
  • Change spdk_bdev_wait_for_examine() from a busy poller (period=0) to a 1ms periodic poller to avoid spinning fd_group_wait() and to ensure examine completion.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
lib/iscsi/iscsi_subsystem.c Registers interrupt for iSCSI poll group poller to avoid always-readable eventfd behavior in interrupt mode.
lib/bdev/bdev.c Replaces busy poller with 1ms periodic poller and documents the interrupt-mode rationale.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lib/iscsi/iscsi_subsystem.c
Comment thread lib/bdev/bdev.c Outdated
Comment thread lib/bdev/bdev.c Outdated
Copy link
Copy Markdown
Member

@tiagolobocastro tiagolobocastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like CI is failing, though not sure if it's related to these changes, could you please take a look?

I've made copilot review but it's been not too helpful

Comment thread lib/bdev/bdev.c
Comment thread lib/iscsi/iscsi_subsystem.c
The iscsi_poll_group_poll busy poller (period=0) did not call
spdk_poller_register_interrupt(), leaving its auto-created eventfd
permanently triggered in the thread's fd_group. This prevents
spdk_fd_group_wait() from blocking when the reactor is in interrupt
mode, defeating the purpose of interrupt-driven operation.

Add spdk_poller_register_interrupt(pg->poller, NULL, NULL) to clean
up the default busy eventfd, matching the pattern already used by
the NVMf transport, NVMf TCP acceptor, and NVMe bdev module.

Ref: openebs/mayastor#1745
Signed-off-by: Jeremias Reith <jr42@users.noreply.github.com>
@jr42 jr42 force-pushed the pr/interrupt-mode-fixes branch from f5f5f55 to e75bc17 Compare April 18, 2026 10:09
jr42 added a commit to jr42/spdk-rs that referenced this pull request Apr 18, 2026
Update SPDK revision to include iSCSI poll group interrupt
registration and bdev wait_for_examine periodic poller fix, both
required for interrupt mode with custom reactors.

Pin targets jr42/spdk@pr/interrupt-mode-fixes (approved but not yet
merged upstream). When openebs/spdk#70 lands on
openebs/v25.05.x-mayastor, flip owner back to "openebs" and update
rev/sha256 to the merged commit.

Depends-On: openebs/spdk#70
Signed-off-by: Jeremias Reith <jr42@users.noreply.github.com>
jr42 added a commit to jr42/mayastor that referenced this pull request Apr 18, 2026
Update spdk-rs submodule to include FdGroup wrapper and Thread
interrupt mode API wrappers required by the reactor interrupt mode
implementation. spdk-rs also bumps its libspdk nix pin to pull in
the iSCSI poll group interrupt registration and bdev
wait_for_examine periodic poller fixes on the SPDK side
(openebs/spdk#70), required for interrupt mode with custom
reactors.

Depends-On: openebs/spdk-rs#105
Depends-On: openebs/spdk#70
Signed-off-by: Jeremias Reith <jr42@users.noreply.github.com>
@jr42 jr42 force-pushed the pr/interrupt-mode-fixes branch from e75bc17 to 6b99052 Compare April 18, 2026 11:46
jr42 added a commit to jr42/spdk-rs that referenced this pull request Apr 18, 2026
Update SPDK revision to include iSCSI poll group interrupt
registration and bdev wait_for_examine periodic poller fix, both
required for interrupt mode with custom reactors.

Pin targets jr42/spdk@pr/interrupt-mode-fixes (approved but not yet
merged upstream). When openebs/spdk#70 lands on
openebs/v25.05.x-mayastor, flip owner back to "openebs" and update
rev/sha256 to the merged commit.

Depends-On: openebs/spdk#70
Signed-off-by: Jeremias Reith <jr42@users.noreply.github.com>
jr42 added a commit to jr42/mayastor that referenced this pull request Apr 18, 2026
Update spdk-rs submodule to include FdGroup wrapper and Thread
interrupt mode API wrappers required by the reactor interrupt mode
implementation. spdk-rs also bumps its libspdk nix pin to pull in
the iSCSI poll group interrupt registration and bdev
wait_for_examine periodic poller fixes on the SPDK side
(openebs/spdk#70), required for interrupt mode with custom
reactors.

Depends-On: openebs/spdk-rs#105
Depends-On: openebs/spdk#70
Signed-off-by: Jeremias Reith <jr42@users.noreply.github.com>
@tiagolobocastro
Copy link
Copy Markdown
Member

@jr42 could you take a look at the failure?
I wonder if the "slower" examine is somehow breaking test assumptions

In interrupt mode a busy poller (period=0) installed by
spdk_bdev_wait_for_examine() gets an always-readable eventfd that
spins fd_group_wait(). Clearing it via
spdk_poller_register_interrupt(ctx->poller, NULL, NULL) leaves the
poller with no interrupt source so it never fires and bdev examine
never completes -- causing spdk_bdev_unregister() to stall in
REMOVING state and destroy to hang.

Fix by adding a fast path: when bdev_module_all_actions_completed()
is already true at call time, defer the callback via a thread message
instead of installing a poller. The message preserves the async
contract (callback runs on the next thread poll, not during this
call) and avoids the interrupt-mode spin entirely for the common
synchronous-examine case. For genuinely asynchronous examine a 1 ms
periodic poller is still used; its timerfd fires in both poll and
interrupt modes.

Validated against the mayastor pytest harness (publish, rebuild,
replica, nexus) and bdev_ut/bdev_ut_mt/part_ut unit tests.

Ref: openebs/mayastor#1745
Signed-off-by: Jeremias Reith <jr42@users.noreply.github.com>
@jr42 jr42 force-pushed the pr/interrupt-mode-fixes branch from 6b99052 to acbaa22 Compare April 20, 2026 22:17
jr42 added a commit to jr42/spdk-rs that referenced this pull request Apr 20, 2026
Update SPDK revision to include iSCSI poll group interrupt
registration and bdev wait_for_examine periodic poller fix, both
required for interrupt mode with custom reactors.

Pin targets jr42/spdk@pr/interrupt-mode-fixes (approved but not yet
merged upstream). When openebs/spdk#70 lands on
openebs/v25.05.x-mayastor, flip owner back to "openebs" and update
rev/sha256 to the merged commit.

Depends-On: openebs/spdk#70
Signed-off-by: Jeremias Reith <jr42@users.noreply.github.com>
@jr42
Copy link
Copy Markdown
Author

jr42 commented Apr 20, 2026

Confirmed, pushed in acbaa2247. The 1 ms poller doesn't fire in the UT harness's mocked TSC, whereas the old busy poller did — so the temp desc spdk_bdev_register opens stays open and later tests UAF.

Went with a fast path in spdk_bdev_wait_for_examine (deferred via msg when examine is already done) instead of sprinkling spdk_delay_us across three UT files — also skips the poller entirely in the common sync case, no eventfd to spin in interrupt mode. bdev_ut/mt/bdev_ut/part_ut all green with ASan.

@tiagolobocastro tiagolobocastro merged commit 8c3189e into openebs:v25.05.x-mayastor Apr 21, 2026
5 checks passed
jr42 added a commit to jr42/spdk-rs that referenced this pull request Apr 21, 2026
Update SPDK revision to include iSCSI poll group interrupt
registration and bdev wait_for_examine periodic poller fix, both
required for interrupt mode with custom reactors.

Pin targets jr42/spdk@pr/interrupt-mode-fixes (approved but not yet
merged upstream). When openebs/spdk#70 lands on
openebs/v25.05.x-mayastor, flip owner back to "openebs" and update
rev/sha256 to the merged commit.

Depends-On: openebs/spdk#70
Signed-off-by: Jeremias Reith <jr42@users.noreply.github.com>
bors-openebs-mayastor Bot pushed a commit to openebs/mayastor that referenced this pull request Apr 24, 2026
1966: feat(reactor): add SPDK interrupt mode support r=tiagolobocastro a=jr42

## Summary

Opt-in SPDK interrupt mode for io-engine: reactors sleep in
`fd_group_wait()` instead of busy-polling, reducing CPU from ~1000m
per core to <300m when idle.

- Enable with `ENABLE_INTERRUPT_MODE=true` (default: off, backward compatible)
- Follows Longhorn v2 hybrid pattern (LEP 2025-07-21): epoll for NVMe-oF
  TCP targets, timerfd polling for NVMe initiators
- Includes pytest compose wiring for interrupt mode testing

### Implementation

Reactor changes (`reactor.rs`, +271 lines):
- New `ReactorState::Interrupt` with `fd_group_wait`-based event loop
- Reactor-level `FdGroup` nests all thread fd_groups for hierarchical mux
- Wakeup eventfd (`FD_TYPE_EVENTFD`, auto-drained) for Rust future delivery
- Cross-core wake on thread schedule to prevent multi-core init deadlock
- Late fd_group nesting in `add_incoming()` for dynamic thread assignment
- Clean shutdown path restoring poll mode

### Why interrupt mode instead of SPDK's dynamic scheduler (#1745)

Mayastor implements its own reactor loop (`reactor.rs`), bypassing
SPDK's stock reactor entirely. SPDK's dynamic scheduler monitors thread
busyness *within SPDK's reactor* -- since mayastor's reactor replaces
it, the scheduler has nothing to observe or control.

Instead, we implement interrupt mode directly in the custom reactor,
following the same pattern validated by Longhorn v2 (LEP 2025-07-21).
This is simpler, more predictable, and doesn't require restructuring the
reactor to use SPDK's scheduler infrastructure. The dynamic scheduler
remains a future option if the reactor is ever migrated closer to SPDK's
stock implementation.

Depends-On: openebs/spdk-rs#105
Depends-On: openebs/spdk#70
Closes: #1745

## Test plan

- [x] 73/86 pytest tests pass in single-core interrupt mode
      (13 failures are env-related, identical in poll mode)
- [x] Multi-core validated (2-core smoke test)
- [x] Production: 3-node cluster, 16 volumes, CPU ~3000m to ~463m (85% reduction)
- [x] Rolling restart validated: volumes auto-recover, nexuses redistribute
- [ ] CI pipeline passes


Co-authored-by: Jeremias Reith <jr42@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants