[Link Event Damping] Add per-port link event damper with syslog and monitor-only mode#1936
[Link Event Damping] Add per-port link event damper with syslog and monitor-only mode#1936DendroLabs wants to merge 3 commits into
Conversation
Add PortLinkEventDamper class that implements RFC 2439 link event damping on a per-port basis in syncd. This is the core damping algorithm that intercepts SAI port state change notifications, applies penalty-based suppression logic, and either forwards or delays forwarding of events. Key features beyond the original sonic-net#1334: - Syslog on suppress/unsuppress transitions (LED_SUPPRESS, LED_UNSUPPRESS) - Monitor-only mode (algorithm=aied-monitor) per RFC 7196 - Implements SelectableEventHandler interface for SelectablesTracker (sonic-net#1798) - Public inspection methods replacing friend class pattern All review feedback from @kcudnik on sonic-net#1334 addressed: - Use std::max in updatePenalty() - Remove comment on setInterval - Join lines, remove unnecessary uint64 cast - Move getCurrentTimeUsecs from header to cpp - Split DampingStats into own header file - Remove friend class usage - Fix comment formatting Supersedes: sonic-net#1334 Signed-off-by: DendroLabs <info@dendrolabs.com>
Add comprehensive unit tests for PortLinkEventDamper covering: - Setup with valid and disabled configs - Penalty ceiling calculation (parameterized) - Timer expiration with/without state advertisement - Event forwarding on config-disabled ports - UP/DOWN events with damping not active - DOWN event activating damping - UP/DOWN events suppressed while damping active - Config update clearing active damping - Monitor-only mode forwarding all events - Monitor-only mode never activating damping Uses TestablePortLinkEventDamper subclass with virtual getCurrentTimeUsecs() for deterministic time control, replacing the friend class / Peer pattern from the original sonic-net#1334. Signed-off-by: DendroLabs <info@dendrolabs.com>
|
/azp run |
|
cc @kcudnik @mikeberesford — requesting review. This supersedes #1334 with all review feedback addressed, plus syslog and monitor-only mode. The flow diagram above addresses the notification lifecycle concern from the original PR. |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Add missing #include <cinttypes> for PRIu64 format macro used in
SWSS_LOG calls at lines 128, 213, and 281.
Fix a bug where resetTimer(0) permanently suppresses a port. When a
DOWN event arrives while damping is active and the accumulated penalty
has decayed below the reuse threshold, timeToReachTargetValue returns 0.
timerfd_settime with it_value={0,0} disarms the timer per the Linux man
page, leaving m_dampingActive=true with no mechanism to clear it. The
fix checks whether the penalty is already at or below the reuse
threshold after updatePenalty() and clears damping immediately instead
of scheduling a zero-interval timer.
Fix typo in test case name (MaxSuppresssTimeIsZero -> MaxSuppressTimeIsZero).
Signed-off-by: DendroLabs <info@dendrolabs.com>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Closing this in favor of #1906, which merged on June 12 and lands the per port link event damper in syncd. Thanks @prsunny for steering the consolidation and @sivat6 for getting it across the line. Glad the notes I left on the CI failures (the Two enhancements I had built here are not in #1906: syslog on suppress and unsuppress, and the RFC 7196 monitor only mode. I think both are worth considering as follow ups, and I have raised them, along with a couple of STATE_DB schema questions, on the CLI PR (sonic-net/sonic-utilities#4367) so they can be settled in one place. Thanks also @kcudnik and @mikeberesford for the earlier review attention on this line of work. |
Summary
Supersedes #1334 by @Ashish1805. Adds the per-port link event damper class that implements RFC 2439 link dampening in syncd. This is the core damping algorithm — the component that decides whether to forward or suppress SAI port state change notifications based on penalty tracking.
All review feedback from #1334 has been addressed (see table below). Additionally, this PR adds two differentiating features that no vendor currently provides:
Syslog on suppress/unsuppress — The add README.md #1 industry complaint (Cisco, Juniper, Arista forums) is that dampening activates silently. We emit
LED_SUPPRESS(WARNING),LED_UNSUPPRESS(NOTICE), andLED_CLEAR(NOTICE) syslog messages on state transitions.Monitor-only mode (
algorithm=aied-monitor) — RFC 7196 recommends a "Calculate But Do Not Damp" mode. No vendor has implemented it. This allows operators to safely tune damping parameters in production without risking outages. Penalty is tracked and logged, but events are never suppressed.Depends on: #1798 (SelectablesTracker, merged), #1935 (RedisInterface, open)
Notification Lifecycle — Addressing @kcudnik's Design Concern
@kcudnik requested a flow diagram for the notification lifecycle. The damper does NOT fabricate new SAI state — it operates as a store-and-forward interceptor:
The "manual construction" at step 4 is simply de-queuing a stored event — every field originates from a real SAI notification received in step 1. This is the standard implementation pattern used by Cisco IOS-XR and Juniper JUNOS.
Safety guarantees:
Review Feedback Addressed
std::min(equivalent logic) inupdatePenalty()getCurrentTimeUsecs()moved to .cpp, made virtual protectedDampingStatssplit toDampingStats.hTestablePortLinkEventDampersubclass with virtual timeMockNotificationHandler.hin separate fileFiles Changed
New:
syncd/DampingStats.h— Statistics struct (split per review)syncd/PortLinkEventDamper.h— Damper class headersyncd/PortLinkEventDamper.cpp— Damper implementation (~350 lines)unittest/syncd/MockNotificationHandler.h— GMock classunittest/syncd/TestPortLinkEventDamper.cpp— 19 test cases (~600 lines)Modified:
syncd/NotificationHandler.h/.cpp— AddonPortStateChangePostLinkEventDamping()virtual callbacksyncd/Makefile.am— AddPortLinkEventDamper.cppunittest/syncd/Makefile.am— Add test fileTest Plan
Unit tests cover:
🤖 Generated with Claude Code