-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[T2-Chassis][Route-Convergence]: route-convergence takes upto 10secs in Process crash(swss/syncd) scenarios #21586
Labels
P0
Priority of the issue
Comments
The observation is that traffic loss is observed when these processes are coming up.. They start learning routes from their neighbors(upstream) and immediately start advertising to their other neighbors(downstream) and start attracting traffic even before programming them in their asics. |
This was referenced Jan 31, 2025
Merged
11 tasks
rlhui
pushed a commit
that referenced
this issue
Feb 11, 2025
mssonicbld
added a commit
to mssonicbld/sonic-buildimage-msft
that referenced
this issue
Feb 11, 2025
<!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md ** Make sure all your commits include a signature generated with `git commit -s` ** If this is a bug fix, make sure your description includes "fixes #xxxx", or "closes #xxxx" or "resolves #xxxx" Please provide the following information: --> #### Why I did it Fixes issue: sonic-net/sonic-buildimage#21586 ##### Work item tracking - Microsoft ADO **31196012**: #### How I did it Run TSA-TSB service upon swss/swss0/swss1/.. startup. If the service is already running, reset the TSA-TSB timer. #### How to verify it Ran the T2 process crash sonic-mgmt snappi test to verify the convergence. Before fix: ~10second After Fix: <10ms <!-- If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012. --> #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, *not* features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [ ] 202205 - [ ] 202211 - [ ] 202305 #### Tested branch (Please provide the tested image version) SONiC.20240532.04 <!-- - Please provide tested image version - e.g. - [x] 20201231.100 --> - [ ] <!-- image version 1 --> - [ ] <!-- image version 2 --> #### Description for the changelog <!-- Write a short (one line) summary that describes the changes in this pull request for inclusion in the changelog: --> <!-- Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU. --> #### Link to config_db schema for YANG module changes <!-- Provide a link to config_db schema for the table for which YANG model is defined Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md --> #### A picture of a cute animal (not mandatory but encouraged)
Merged
11 tasks
mssonicbld
added a commit
to Azure/sonic-buildimage-msft
that referenced
this issue
Feb 12, 2025
<!-- Please make sure you've read and understood our contributing guidelines: https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` ** If this is a bug fix, make sure your description includes "fixes #xxxx", or "closes #xxxx" or "resolves #xxxx" Please provide the following information: --> #### Why I did it Fixes issue: sonic-net/sonic-buildimage#21586 ##### Work item tracking - Microsoft ADO **31196012**: #### How I did it Run TSA-TSB service upon swss/swss0/swss1/.. startup. If the service is already running, reset the TSA-TSB timer. #### How to verify it Ran the T2 process crash sonic-mgmt snappi test to verify the convergence. Before fix: ~10second After Fix: <10ms <!-- If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012. --> #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, *not* features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [ ] 202205 - [ ] 202211 - [ ] 202305 #### Tested branch (Please provide the tested image version) SONiC.20240532.04 <!-- - Please provide tested image version - e.g. - [x] 20201231.100 --> - [ ] <!-- image version 1 --> - [ ] <!-- image version 2 --> #### Description for the changelog <!-- Write a short (one line) summary that describes the changes in this pull request for inclusion in the changelog: --> <!-- Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU. --> #### Link to config_db schema for YANG module changes <!-- Provide a link to config_db schema for the table for which YANG model is defined Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md --> #### A picture of a cute animal (not mandatory but encouraged)
mssonicbld
added a commit
to mssonicbld/sonic-mgmt.msft
that referenced
this issue
Feb 14, 2025
<!-- Please make sure you've read and understood our contributing guidelines; https://github.com/sonic-net/SONiC/blob/gh-pages/CONTRIBUTING.md Please provide following information to help code review process a bit easier: --> ### Description of PR <!-- - Please include a summary of the change and which issue is fixed. - Please also include relevant motivation and context. Where should reviewer start? background context? - List any dependencies that are required for this change. --> Summary: TSA-TSB service Testcases: Adjust the testcases to adhere to new behavior of config_reload ### Type of change <!-- - Fill x for your type of change. - e.g. - [x] Bug fix --> - [ ] Bug fix - [ ] Testbed and Framework(new/improvement) - [ ] New Test case - [ ] Skipped for non-supported platforms - [x] Test case improvement ### Back port request - [ ] 202012 - [ ] 202205 - [ ] 202305 - [ ] 202311 - [x] 202405 - [x] 202411 ### Approach #### What is the motivation for this PR? As a fix for the issue sonic-net/sonic-buildimage#21586, TSA-TSB service is invoked upon swss bring up(sonic-net/sonic-buildimage#21587). This affects config_reload behavior, where after config reload the tsa-tsb service will be restarted, and the device will be in TSA state till timer expires. Adjusting the testcase to explicitly execute TSB for the DUT to be ready for next testcase, #### How did you do it? Enhanced the config_reload api to optionally take exec_tsb parameter. For startup-TSA-TSB and reliable TSA-TSB testcases, pass this flag to True to explicitly execute TSB on the device after config reload. #### How did you verify/test it? Ran the tests on t2 #### Any platform specific information? NA #### Supported testbed topology if it's a new test case? NA ### Documentation <!-- (If it's a new feature, new test case) Did you update documentation/Wiki relevant to your implementation? Link to the wiki page? -->
This was referenced Feb 14, 2025
11 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
On T2 Chassis, running 202405 image, process crash takes upto 10 seconds for traffic to converge. It needs to be optimized. Ideally, we should achieve subsecond convergence in this scenario..
Testplan:
https://github.com/sonic-net/sonic-mgmt/blob/master/docs/testplan/Convergence%20measurement%20in%20data%20center%20networks.md#test-case--26
Number of prefixes:
60k(30k V4+30k v6) from each Upstream Neighbors
Number of Upstream Neighbors: 16
Testcase:
https://github.com/sonic-net/sonic-mgmt/blob/master/tests/snappi_tests/multidut/bgp/test_bgp_outbound_uplink_process_crash.py
The text was updated successfully, but these errors were encountered: