Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bgpd: handling flapped updates considered as duplicates in route proc… #18274

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lsang6WIND
Copy link

…essing

When many routes are involved, clearing a neighbor and reconnecting causes some updates to arrive too quickly before bgpd finishes processing the route flush. I observed that updates are scheduled for withdrawal, but by the time bgpd re-establishes the connection and finishes receiving updates, it starts processing them. Some updates are considered duplicates because they use the same bgp_dest instance (which is destroyed once the withdrawal is processed), leading to route losses.

The fix is to check if a withdrawal advertisement is already scheduled when a duplicate update occurs. If so, the withdraw message is dropped.

…essing

When many routes are involved, clearing a neighbor and reconnecting
causes some updates to arrive too quickly before bgpd finishes
processing the route flush. I observed that updates are scheduled for
withdrawal, but by the time bgpd re-establishes the connection and
finishes receiving updates, it starts processing them. Some updates are
considered duplicates because they use the same bgp_dest instance (which
is destroyed once the withdrawal is processed), leading to route losses.

The fix is to check if a withdrawal advertisement is already scheduled
when a duplicate update occurs. If so, the withdraw message is dropped.

Signed-off-by: Loïc Sang <loic.sang@6wind.com>
@donaldsharp
Copy link
Member

if bgp has not finished the route flush it should not allow the transition out of clearing to start the next connection. How is that happening? And if it is happening then that is the actual bug that needs to be fixed.

Copy link
Member

@donaldsharp donaldsharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to get to the bottom of the proper way to handle this situation instead of doing this change.

@lsang6WIND
Copy link
Author

lsang6WIND commented Feb 27, 2025

if bgp has not finished the route flush it should not allow the transition out of clearing to start the next connection. How is that happening? And if it is happening then that is the actual bug that needs to be fixed.

The function bgp_clear_route_node (link: bgp_clear_route_node) deletes path information for a dest. It mainly calls bgp_rib_remove (link: bgp_rib_remove), which marks the path as invalid and schedules the bgp process.

Once all dests are scheduled for bgp processing, the peer status moves from clearing to starting the next connection. At this point, no routes are removed from the adj out rib.

@lsang6WIND
Copy link
Author

...And if it is happening then that is the actual bug that needs to be fixed.

The issue is that in my case, the update is detected as a duplicate mainly due to the SUBGRP_STATUS_FORCE_UPDATES flag (link: SUBGRP_STATUS_FORCE_UPDATES).

How can I tell the difference between a new update and a withdraw? Even if I manage to skip the duplicate detection, should I also clean the adj->adv and resend the update since the withdraw was dropped?

@donaldsharp
Copy link
Member

we should not be moving from clearing to any other state until all routes have been processsed for that peer.

@lsang6WIND
Copy link
Author

we should not be moving from clearing to any other state until all routes have been processsed for that peer.

I'm not sure if you fully understand the case I'm referring to. When I perform a clear bgp peer, transitioning out of the clearing state means that bgpd has correctly removed the adj in entries for that peer, but not yet the adj out (redistributions).

My understanding is that you’re suggesting we wait until bgpd has finished processing both the adj in and adj out tables before clearing the peer? If it is the case, then we would need to rework how bgp removes adj out. Perhaps we could attach a completion_func (like bgp_clear_node_complete) to bgp->process_queue when clearing a peer, and then detach it once the process is finished?

@donaldsharp
Copy link
Member

I understand perfectly fine. I am still standing by my assertion that we should not be coming out of clearing state for the bgp peer until we have finished outgoing updates from said clearing operation.

In any event I've asked some of our other BGP experts to take a look at this PR and see what their opinion is.

@mjstapp @ton31337 @louberger and @riw777 I would like your opinion on this PR and what BGP should be doing.

@ton31337
Copy link
Member

When many routes are involved, clearing a neighbor and reconnecting causes some updates to arrive too quickly before bgpd finishes processing the route flush.

How +- routes is "many routes"? And what is the exact procedure (clearing a neighbor and reconnecting) to replicate this? Just doing clear bgp ...?

The fix is to check if a withdrawal advertisement is already scheduled when a duplicate update occurs.

How do you check for that? I see you just flush the withdraw queue for that specific advertisement (duplicate).

--

Won't we have a reverse situation with this patch that we miss a withdraw and sit with a stale route at some specific edge case?

--

Also, I agree with Donald that we shouldn't exit the Clearing_Completed FSM state until we have some entries in the Adj-Rib-Out (not only by checking if the workqueue is empty).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants