Skip to content

Commit

Permalink
Fix TSA-TSB race condition on multi-asic platforms (sonic-net#710)
Browse files Browse the repository at this point in the history
<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "fixes #xxxx", or
 "closes #xxxx" or "resolves #xxxx"

 Please provide the following information:
-->

#### Why I did it
Fixes sonic-net#21816

##### Work item tracking
- Microsoft ADO **31499777**:

#### How I did it
Setting the STATE_DB ALL_SERVICE_STATUS|tsa_tsb_service flag first as part of startup_tsa_tsb service, followed by configuring TSA.
And as part of the case, when tsa_ena is False (genuine or due to race condition), we explictly call TSA again to ensure all asics go to TSA state.
#### How to verify it
Reboot the multi-asic linecard, and validate that all asics are in TSA state and TSA-TSB timer is running
config_reload

Tested following scenarios:
1. reboot multi-asic linecard
2. config reload
3. execute TSA while the service is running
4. TSA, config save and then config_reload
5. execute TSB while the service is running
<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205
- [ ] 202211
- [ ] 202305

#### Tested branch (Please provide the tested image version)
20240532.08
<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
  • Loading branch information
mssonicbld authored Feb 26, 2025
1 parent 30dd1b2 commit 94eecda
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions files/scripts/startup_tsa_tsb.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,12 +67,12 @@ def config_tsa():
num_asics = multi_asic.get_num_asics()
tsa_ena = get_tsa_status(num_asics)
if tsa_ena == True:
logger.log_info("Configuring TSA")
subprocess.check_output(['TSA']).strip()
logger.log_info("Setting TSA-TSB service field in STATE_DB")
subprocess.check_output([
'sonic-db-cli', 'STATE_DB', 'HSET', 'ALL_SERVICE_STATUS|tsa_tsb_service', 'running', 'OK'
]).strip()
logger.log_info("Configuring TSA")
subprocess.check_output(['TSA']).strip()
else:
#check if tsa_tsb service is already running, restart the timer
try:
Expand All @@ -84,6 +84,8 @@ def config_tsa():

if startup_tsa_tsb_service_status == 'OK':
logger.log_info("TSA-TSB service is already running, just restart the timer")
# execute TSA again: this is to overcome race condition where in its previous run, TSA configuration didnt complete on all asics
subprocess.check_output(['TSA']).strip()
return True
else:
if num_asics > 1:
Expand Down

0 comments on commit 94eecda

Please sign in to comment.