Link event damping feature#1906
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@sivat6 , can you check the build failure? |
|
HLD PR - sonic-net/SONiC#1071 |
There was a problem hiding this comment.
Pull request overview
This PR implements the link event damping feature end-to-end in sairedis/syncd: it introduces Redis-extension PORT attributes to configure damping, applies the damping algorithm to port state change notifications in syncd, and adds a periodic timer to end suppression even without new link events.
Changes:
- Add Redis-extension PORT attributes and plumbing (libsai ⇄ syncd) for link-event damping algorithm/config.
- Apply damping during port state change notification processing, plus a timer thread to enforce max-suppress-time expiry.
- Emit per-port damping counters/state into a dedicated DB table for observability.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
unittest/lib/TestClientServerSai.cpp |
Adds unit tests for setting Redis PORT damping attributes and client-mode rejection. |
syncd/tests/TestSyncdLinkEventDamping.cpp |
Adds a syncd integration-style test for damping config set behavior. |
syncd/tests/Makefile.am |
Registers the new syncd damping test source file. |
syncd/Syncd.h |
Introduces per-port damping state structure, helper APIs, and thread/db members. |
syncd/Syncd.cpp |
Implements config handling, damping algorithm, timer thread, and stats publishing. |
syncd/NotificationProcessor.h |
Extends constructor to accept an optional “apply damping” callback. |
syncd/NotificationProcessor.cpp |
Filters port state change notifications by applying damping before sending. |
lib/sairediscommon.h |
Adds the Redis op name for damping config set. |
lib/Sai.cpp |
Bypasses metadata for Redis PORT extension attributes and routes them to RedisSai. |
lib/RedisRemoteSaiInterface.h |
Declares Redis PORT attribute detection and damping config setter helpers. |
lib/RedisRemoteSaiInterface.cpp |
Implements Redis PORT attribute set handling and op dispatch to syncd. |
lib/ClientSai.cpp |
Rejects Redis extension attrs (including Redis PORT attrs) in CLIENT mode. |
|
Please address copilot comments |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Addressed all copilot comments. |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Can you check the build failure? |
4bb4573 to
8f4866a
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
Signed-off-by: Sivakumar Thirukkanna Thevar <sthirukkanna@juniper.net>
Signed-off-by: Sivakumar Thirukkanna Thevar <sthirukkanna@juniper.net>
Signed-off-by: Sivakumar Thirukkanna Thevar <sthirukkanna@juniper.net>
Signed-off-by: Sivakumar Thirukkanna Thevar <sthirukkanna@juniper.net>
Signed-off-by: Sivakumar Thirukkanna Thevar <sthirukkanna@juniper.net>
Signed-off-by: Sivakumar Thirukkanna Thevar <sthirukkanna@juniper.net>
Signed-off-by: Sivakumar Thirukkanna Thevar <sthirukkanna@juniper.net>
Signed-off-by: Sivakumar Thirukkanna Thevar <sthirukkanna@juniper.net>
Signed-off-by: Sivakumar Thirukkanna Thevar <sthirukkanna@juniper.net>
…yncd/tests/TestSyncdLinkEventDamping.cpp Signed-off-by: Sivakumar Thirukkanna Thevar <sthirukkanna@juniper.net>
Signed-off-by: Sivakumar Thirukkanna Thevar <sthirukkanna@juniper.net>
Signed-off-by: Sivakumar Thirukkanna Thevar <sthirukkanna@juniper.net>
Signed-off-by: Sivakumar Thirukkanna Thevar <sthirukkanna@juniper.net>
Signed-off-by: Sivakumar Thirukkanna Thevar <sthirukkanna@juniper.net>
Signed-off-by: Sivakumar Thirukkanna Thevar <sthirukkanna@juniper.net>
46ac713 to
6fc0dd4
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Hi, there are workflow run(s) waiting for approval, you may be first-time contributor. I will notify maintainers to help approve once PR is approved. Thanks! ---Powered by SONiC BuildBot
|
|
/azpw retry |
|
Retrying failed(or canceled) jobs... |
|
Retrying failed(or canceled) stages in build 1136667: ✅Stage Test:
|
| /** | ||
| * @brief Link event damping configuration and state per port | ||
| */ | ||
| struct LinkEventDampingPortState |
There was a problem hiding this comment.
this should go to separate header
| * @return true if should suppress, false if should propagate | ||
| */ | ||
| bool applyAiedAlgorithm( | ||
| _In_ sai_object_id_t portVid, |
| // Damping exits when EITHER: | ||
| // 1. Time-based: damping_duration_ms >= max_suppress_time | ||
| // 2. Penalty-based: current_penalty < reuse_threshold (decay-based recovery) | ||
| if (state.is_damping_active) | ||
| { | ||
| // Check timeout - never suppress longer than max_suppress_time | ||
| // This is a hard timestamp-based limit to prevent infinite suppression | ||
| uint64_t damping_duration_ms = currentTimeMs - state.damping_start_time_ms; | ||
|
|
||
| if (damping_duration_ms >= state.aied_config.max_suppress_time) | ||
| { | ||
| // Store temporary strings to avoid dangling pointers | ||
| std::string physicalStatusStr = sai_serialize_port_oper_status(state.physical_status); | ||
| std::string advertisedStatusStr = sai_serialize_port_oper_status(state.advertised_status); | ||
| SWSS_LOG_NOTICE("Port VID %s exiting damped state: max suppress time (%u ms) " | ||
| "exceeded. Duration: %lu ms. Physical state: %s, Advertised state: %s", | ||
| portVidStr.c_str(), state.aied_config.max_suppress_time, | ||
| damping_duration_ms, physicalStatusStr.c_str(), | ||
| advertisedStatusStr.c_str()); | ||
| state.is_damping_active = false; | ||
| state.damping_start_time_ms = 0; // Reset timer when exiting damping | ||
|
|
||
| // Propagate last link event when penalty decays below reuse threshold | ||
| if (state.advertised_status != state.physical_status) | ||
| { | ||
| state.pending_state_sync = true; | ||
| SWSS_LOG_NOTICE("Port VID %s state mismatch detected on damping " | ||
| "exit (timeout): physical=%s, advertised=%s. " | ||
| "Marking for state sync on next notification.", | ||
| portVidStr.c_str(), physicalStatusStr.c_str(), | ||
| advertisedStatusStr.c_str()); | ||
| } | ||
| state.advertised_status = state.physical_status; | ||
|
|
||
| // Write updated state to STATE_DB after exiting damping | ||
| writeDampingCountersToStateDb(portVid, state); | ||
| } | ||
| // Check reuse threshold - exit if penalty decays below threshold | ||
| // Penalty decays based on last_decay_time tracking | ||
| else if (state.current_penalty < state.aied_config.reuse_threshold) | ||
| { | ||
| // Store temporary strings to avoid dangling pointers | ||
| std::string physicalStatusStr = sai_serialize_port_oper_status(state.physical_status); | ||
| std::string advertisedStatusStr = sai_serialize_port_oper_status(state.advertised_status); | ||
| SWSS_LOG_NOTICE("Port VID %s exiting damped state: penalty (%u) < " | ||
| "reuse_threshold (%u). Penalty decayed due to exponential decay " | ||
| "formula. Physical state: %s, Advertised state: %s", | ||
| portVidStr.c_str(), state.current_penalty, | ||
| state.aied_config.reuse_threshold, physicalStatusStr.c_str(), | ||
| advertisedStatusStr.c_str()); | ||
| state.is_damping_active = false; |
There was a problem hiding this comment.
code quality in this function and ohters really suffers, and its not keeping same standard as entire project missing spaces empty lines, just many blobs of code not readable, commnets before else if ..
Other changes:
Feature Functionality implementation was missing:
The existing implementation only defines structures and enums for configuration provisioning. The damping functional logic was not implemented. Implemented the damping algorithm as mentioned in the HLD and invoked during the link event handling.
Missing Damping Statistics
Damping-related counters mentioned in HLD are not present in STATS_DB. Added the damping stats (pre & post damping counters) to STATS_DB
Physical vs Operational State Mismatch
The damping algorithm is triggered during link up/down events, with timer and threshold checks handled there.
Once the maximum suppress time is reached, the actual status update occurs only during the next link event, causing a mismatch. Added the timer thread which periodically checks the damping timer and if the damping ends, it updates the physical link status.
Test Plan: Test plan is already in place (https://github.com/sivat6/sonic-mgmt/blob/master/docs/testplan/Link_event_damping.md)
Test Results: This test plan has 9 major test cases and there are about 54 total test cases (Including sub test cases). All these 54 test cases are executed.