-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Routeorch] Fix next hop group reference count in bulk operation #1501
Conversation
@@ -1392,11 +1411,6 @@ bool RouteOrch::addRoutePost(const RouteBulkContext& ctx, const NextHopGroupKey | |||
increaseNextHopRefCount(nextHops); | |||
|
|||
decreaseNextHopRefCount(it_route->second); | |||
if (it_route->second.getSize() > 1 | |||
&& m_syncdNextHopGroups[it_route->second].ref_count == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it easier to insert bulkNhgReducedRefCnt
here? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep old if-statement, I think it will keep the bulkNhgReducedRefCnt
in small size.
In reply to: 521673770 [](ancestors = 521673770)
*/ | ||
decreaseNextHopRefCount(it_route->second); | ||
if (it_route->second.getSize() > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same #Closed
{ | ||
if (m_syncdNextHopGroups[*it_nhg].ref_count == 0) | ||
{ | ||
removeNextHopGroup(*it_nhg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removeNextHopGroup [](start = 16, length = 18)
Could you write a vs testcase to fail old code and pass new code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to add a vs test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A vs test case for this particular issue could be challenging form two aspects:
- It requires a specific bulk pattern to happen, but we do not have control over how the operations are partition into bulks. Constantly reproducing the issue in vs test could be a problem.
- The issue is transient, it will get resolved itself and the only trace would be the log messages of failed attempts of nexthop group removals. Basically, there is a limited sign of the issue ever happened.
I think it needs some thinking in developing this test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree with you. I think maybe we need to look into the googletest framework for such sequencing testing. the tests under tests/mock_tests
orchagent/routeorch.cpp
Outdated
@@ -22,6 +22,8 @@ extern CrmOrch *gCrmOrch; | |||
|
|||
const int routeorch_pri = 5; | |||
|
|||
std::set<NextHopGroupKey> bulkNhgReducedRefCnt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bulkNhgReducedRefCnt [](start = 26, length = 20)
Change to a RouteOrch member variable? #Closed
is this affecting 201911 as well? |
This should not affect 201911 since the bulk mode is not in 201911 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…ic-net#1501) What I did Remove next-hop groups after updating the reference counter for a bulk of routes instead of removing next-hop groups in the loop of updating the reference counter. Fix sonic-net/sonic-buildimage#5813 Why I did it The bulk route API has two loops of updating the reference counter: 1. update the sairedis reference counter 2. update the orchagent reference counter Before this commit, the removal of next-hop groups is triggered in the second loop of updating the orchagent reference counter when the reference counter decreases to zero. This may result in a reference counter mismatch between orchagent and sairedis since the sairedis reference counter has already included the operation of the whole bulk but the orchagent reference counter has not. Therefore, the removal of next-hop group may fail due to the mismatch in reference counter (e.g., there are some other routes point to the next-hop group but has not been counted in orchagent yet). To fix this problem, the next-hop group removal operation should be done after updating the reference counter of the whole bulk to make sure the reference counters sairedis and orchagent matches.
…route at same time. (sonic-net#1698) reference sonic-net#1501: remove overlay nexthop after updating the reference counter for a bulk of routes
… on Y cable (sonic-net#1501) Summary There is also support for new commands for muxcable/Y cable show muxcable hwmode switchmode sudo config muxcable hwmode setswitchmode manual/auto all which can basically set/display the switching modes on the cable to auto or manual using the cli What I did Also added these new commands admin@STR43-0101-0101-01LT0:~$ show muxcable hwmode switchmode Port Switching --------- ----------- Ethernet0 manual Ethernet4 manual admin@STR43-0101-0101-01LT0:/usr$ sudo config muxcable hwmode setswitchmode manual all Muxcable at port all will be changed to manual switching mode. Continue? [y/N]: y Success in switching mode on port Ethernet0 to manual Success in switching mode on port Ethernet4 to manual admin@STR43-0101-0101-01LT0:/usr$ sudo config muxcable hwmode setswitchmode auto all Muxcable at port all will be changed to auto switching mode. Continue? [y/N]: y Success in switching mode on port Ethernet0 to auto Success in switching mode on port Ethernet4 to auto Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
What I did
Remove next-hop groups after updating the reference counter for a bulk of routes instead of removing next-hop groups in the loop of updating the reference counter.
Fix sonic-net/sonic-buildimage#5813
Why I did it
The bulk route API has two loops of updating the reference counter:
Currently, the removal of next-hop groups is triggered in the second loop of updating the orchagent reference counter when the reference counter decreases to zero. This may result in a reference counter mismatch between orchagent and sairedis since the sairedis reference counter has already included the operation of the whole bulk but the orchagent reference counter has not. Therefore, the removal of next-hop group may fail due to the mismatch in reference counter (e.g., there are some other routes point to the next-hop group but has not been counted in orchagent yet).
To fix this problem, the next-hop group removal operation should be done after updating the reference counter of the whole bulk to make sure the reference counters sairedis and orchagent matches.
How I verified it
Details if related