Fix LAG going down after warm reboot with SONiC neighbors #17040
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #16875.
Why I did it
On physical devices, when doing a warm reboot on an image that supports the teamd retry count feature, with neighbor devices that are also running SONiC and support the teamd retry count feature, the LAG will become operationally down. Specifically, while the LAG won't go into expired or defaulted state, the DUT undergoing warm reboot will send a LACP PDU indicating it's not ready to send or receive traffic:
This is indicated by the state flags in the actor information TLV being none (i.e. no flags are set).
This LACP PDU was generated and sent from the DUT when teamd was initializating in warm-reboot mode. The purpose of this mode is to keep the LAG alive across warm reboots. To do this, the last PDU packet received from the partner is saved to disk before warm reboot. This packet is then read after the warm reboot happens and teamd starts up, and processed as if it was just received from the partner. This allows teamd to send a PDU packet as soon as possible, to avoid hitting the 90-second limit and risking the LAG going down. This also prevents teamd from sending a PDU packet to the partner without any partner information filled in and without any of the actor's state flags being set, like what would happen with a normal teamd startup after cold reboot; this would tell the partner that the LAG went down, and there would be a traffic disruption.
Now, with the teamd retry count feature, a new LACP packet version (version 0xf1) is used for LACP packets. As part of the HLD/implementation for this, if there's a change in LACP packet version received from the peer (i.e. the peer changed from sending 0x1 version PDU packets to 0xf1 version PDU packets, or vice-versa), then a PDU packet is immediately sent, as an acknowledgment that the version has changed. The "initial" version that is used by teamd for this purpose is 0x1, meaning it defaults to assuming 0x1 packets are being used. Because the last saved packet will almost certainly be version 0xf1 when this feature is enabled, this would be seen as a version change, and an acknowledgement packet would be sent.
That alone isn't an issue. The issue comes from the fact that at the point this version check is done and the acknowledgement packet is generated and sent, the actor's own state flags haven't been fully set/initialized yet. This happens after the state of the LAG (whether it's in active, expired, or defaulted state) is set within teamd, and this happens in the
lacp_port_set_state
function. The actor state flags update happens in thelacp_port_actor_update
function, which gets called both withinlacp_port_set_state
and additionally afterlacp_port_set_state
gets called (this function is idempotent, so it can be called multiple times without any negative impact). Thelacp_port_set_state
function gets called a couple lines after the version check is done, which is the problem. As a result, the partner thinks that the DUT is not ready to use the LAG, and so brings the LAG down, disrupting traffic.This issue doesn't happen on KVM likely because datapath in KVM goes down during warm/fast-reboot, and the kernel interfaces aren't shown as being oper up until well after teamd starts. Teamd checks to see if the interface is oper up before reading and processing the saved PDU packet.
Work item tracking
How I did it
Make sure that if an ack packet needs to be sent becuase the retry count has changed, it is sent after
lacp_port_actor_update
has been called.How to verify it
Tested on warm reboot on Mellanox DUT with SONiC neighbors, and verified the SONiC neighbors are not disabling the port channel interface.
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)