Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid removing a VRF routing table when there are pending creation entries in gRouteBulker #3477

Merged
merged 6 commits into from
Jan 30, 2025

Conversation

stephenxs
Copy link
Collaborator

@stephenxs stephenxs commented Jan 20, 2025

What I did

Avoid removing a VRF routing table when there are pending creation entries in gRouteBulker

  1. Remove a VRF routing table when a routing entry is removed only if there is no pending creation entry in gRouteBulker
  2. Avoid uninitialized value SAI IP address/prefix structure

Why I did it

Fix issue: out of range exception can be thrown in addRoutePost due to non exist VRF

(gdb) bt
#0  0x00007f5791aedebc in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f5791a9efb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f5791a89472 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f5791de0919 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007f5791debe1a in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007f5791debe85 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007f5791dec0d8 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007f5791de3240 in std::__throw_out_of_range(char const*) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00005594e856d956 in std::map<unsigned long, std::map<swss::IpPrefix, RouteNhg, std::less<swss::IpPrefix>, std::allocator<std::pair<swss::IpPrefix const, RouteNhg> > >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::map<swss::IpPrefix, RouteNhg, std::less<swss::IpPrefix>, std::allocator<std::pair<swss::IpPrefix const, RouteNhg> > > > > >::at (this=<optimized out>, __k=<optimized out>) at /usr/include/c++/12/bits/stl_map.h:551
#9  0x00005594e8564beb in RouteOrch::addRoutePost (this=this@entry=0x5594ea13e080, ctx=..., nextHops=...) at ./orchagent/routeorch.cpp:2145
#10 0x00005594e856b0b2 in RouteOrch::doTask (this=0x5594ea13e080, consumer=...) at ./orchagent/routeorch.cpp:1021
#11 0x00005594e85282d2 in Orch::doTask (this=0x5594ea13e080) at ./orchagent/orch.cpp:553
#12 0x00005594e851909a in OrchDaemon::start (this=this@entry=0x5594ea0a0950) at ./orchagent/orchdaemon.cpp:895
#13 0x00005594e8485632 in main (argc=<optimized out>, argv=<optimized out>) at ./orchagent/main.cpp:818

How I verified it

Unit (mock) test

Details if related

Originally, it cleaned up a VRF routing table whenever a prefix of the VRF was removed if

  1. there was no routing entry in the VRF routing table and
  2. the prefix was not pending creation in gRouteBulker

The motivation is to remove a VRF routing table if there is no routing entry in the VRF and no routing entry pending creation for that VRF. However, condition 2 does not guarantee that.

The ideal way of the 2nd condition is to check pending creation entries of a certain VRF, which we can not do.
So, we are using strict conditions here as the following:

  1. there is no routing entry in the VRF routing table and
  2. there is no pending creating routing entry in gRouteBulker regardless of which VRF it belongs to

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephenxs stephenxs changed the title Set bytes to 0 before copying IPv4 addresses/masks to avoid uninitialized bytes Erase VRF routing table only if there is no pending creation entries in gRouteBulker Jan 21, 2025
@stephenxs stephenxs changed the title Erase VRF routing table only if there is no pending creation entries in gRouteBulker Avoid removing a VRF routing table when there are pending creation entries in gRouteBulker Jan 22, 2025
@stephenxs stephenxs force-pushed the fix-uninitialized-ipv4-bytes branch from 5a7da1f to 727bb0e Compare January 22, 2025 05:19
@mssonicbld
Copy link
Collaborator

/azp run

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@abdosi
Copy link
Contributor

abdosi commented Jan 22, 2025

thanks @stephenxs

@stephenxs stephenxs marked this pull request as ready for review January 23, 2025 07:08
@stephenxs stephenxs requested a review from prsunny as a code owner January 23, 2025 07:08
@stephenxs stephenxs force-pushed the fix-uninitialized-ipv4-bytes branch from 3b84c79 to 5ffa1f3 Compare January 23, 2025 07:08
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephenxs
Copy link
Collaborator Author

Didn't find a failure but vstest returned non zero

@stephenxs
Copy link
Collaborator Author

/azpw run

@mssonicbld
Copy link
Collaborator

/AzurePipelines run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@stephenxs
Copy link
Collaborator Author

Almost all failures in Test vstest pipeline are also observed on other swss and sairedis PR checks but no failure is relevant to the PRs themself.
@abdosi @prsunny would you help to check the failures? thanks

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

…ized bytes

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
@stephenxs stephenxs force-pushed the fix-uninitialized-ipv4-bytes branch from 76dbbfe to 601e675 Compare January 28, 2025 23:59
@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@prsunny
Copy link
Collaborator

prsunny commented Jan 30, 2025

@abdosi , please check the cherry-pick labels. Looks like there is conflict and may have to raise separate PR

@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202405: #3493

@bingwang-ms
Copy link
Contributor

Cherry-pick to 202405 since PR #3477 was in 202405

@keboliu
Copy link
Collaborator

keboliu commented Feb 5, 2025

Hi @abdosi is there a separated PR for 202411 as it has conflict and can not be cherry-picked?

@stephenxs
Copy link
Collaborator Author

@abdosi , please check the cherry-pick labels. Looks like there is conflict and may have to raise separate PR

@prsunny @abdosi @keboliu
we have a conflict in cherry-picking the PR to 202411 because the dependency PR #3394 hasn't been in 202411.
Once we cherry-pick #3394, this PR can be cherry-picked cleanly.

stephens@ae9afa824b59:/sonic/src/sonic-swss$ git checkout origin/202411 
HEAD is now at 822310d4 Ignore failure in setPortPfcAsym if SAI_STATUS_NOT_SUPPORTED (#3471)
stephens@ae9afa824b59:/sonic/src/sonic-swss$ git cherry-pick 8465c1db68f44034adda55d86cbddfab244766e7
[detached HEAD f74afca9] Added change not to create ECMP Group in SAI and program the route if none of ECMP members are active/link-up (#3394)
 Author: abdosi <58047199+abdosi@users.noreply.github.com>
 Date: Wed Dec 18 23:09:21 2024 +0530
 3 files changed, 54 insertions(+), 26 deletions(-)
stephens@ae9afa824b59:/sonic/src/sonic-swss$ git cherry-pick df08d2d8851980217826d3756292223cc1a593d6
[detached HEAD 5d415c13] Avoid removing a VRF routing table when there are pending creation entries in gRouteBulker (#3477)
 Author: Stephen Sun <5379172+stephenxs@users.noreply.github.com>
 Date: Fri Jan 31 01:46:29 2025 +0800
 3 files changed, 47 insertions(+), 5 deletions(-)
stephens@ae9afa824b59:/sonic/src/sonic-swss$ git log
commit 5d415c1385697887a02d844e82164d97c0517bdb (HEAD)
Author: Stephen Sun <5379172+stephenxs@users.noreply.github.com>
Date:   Fri Jan 31 01:46:29 2025 +0800

    Avoid removing a VRF routing table when there are pending creation entries in gRouteBulker (#3477)
    
    * Set bytes to 0 before copying IPv4 addresses/masks to avoid uninitialized bytes
    * Avoid removing a VRF routing table when there are pending creation entries in gRouteBulker
    
    Remove a VRF routing table when a routing entry is removed only if there is no pending creation entry in gRouteBulker
    Avoid uninitialized value SAI IP address/prefix structure
    Why I did it
    
    Fix issue: out of range exception can be thrown in addRoutePost due to non exist VRF

commit f74afca96110725075f7f72a3973bea64570b0c2
Author: abdosi <58047199+abdosi@users.noreply.github.com>
Date:   Wed Dec 18 23:09:21 2024 +0530

    Added change not to create ECMP Group in SAI and program the route if none of ECMP members are active/link-up (#3394)
    
    What I did:
    Added change not to create ECMP Group in SAI and program the route if none of the ECMP members are active/link-up.
    Also do not program the Temp Route if Neigh is not active (Link Down)
    
    Also as part of this change if Route is not programmed and if we remove that route than decrement VRF Reference count in removeRoute as removeRoutePost will not be called in this case.
    
    Why I did:
    In scale setup of T2 it's possible all links can go down simultaneously which case we can get Route messages with all nexthops being in down state. In such case we might create empty Nexthop Group in SAI for the given route which causes not needed SAI call for Nexthop Group creation and also create traffic blackhole for the route where that route can still forward traffic via default route if eligible/applicable.
    
    Also in this case no point to add Temp Route if neighbor is link down.

commit 822310d4100fd7b9200b0692498af29ee46de669 (origin/202411)
Author: mssonicbld <79238446+mssonicbld@users.noreply.github.com>
Date:   Wed Jan 15 16:02:30 2025 +0800

    Ignore failure in setPortPfcAsym if SAI_STATUS_NOT_SUPPORTED (#3471)
    
    <!--
    Please make sure you have read and understood the contribution guildlines:
    https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md
    
    1. Make sure your commit includes a signature generted with `git commit -s`
    2. Make sure your commit title follows the correct format: [component]: description
    3. Make sure your commit message contains enough details about the change and related tests
    4. Make sure your pull request adds related reviewers, asignees, labels
    
    Please also provide the following information in this pull request:
    -->

@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202411: #3526

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants