Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[orchagent] Set ABRT signal in STATE_DB during a SAI failure #2556

Closed
wants to merge 18 commits into from

Conversation

vivekrnv
Copy link
Contributor

@vivekrnv vivekrnv commented Dec 5, 2022

Signed-off-by: Vivek Reddy Karri vkarri@nvidia.com

What I did

  • During a SAI programming failure, made orchagent set the ORCH_ABRT_STATUS flag in STATE_DB. Also, delete the flag once the orchagent restarts

Why I did it

How I verified it

Simulate a SAI failure:

81396-Nov 21 18:14:59.131240 mtbc-sonic-01-2410 ERR swss#orchagent: :- create: create status: SAI_STATUS_INVALID_ATTRIBUTE_MAX
81397-Nov 21 18:14:59.131240 mtbc-sonic-01-2410 ERR swss#orchagent: :- sflowCreateSession: Failed to create sample packet session with rate 512
81398:Nov 21 18:14:59.131240 mtbc-sonic-01-2410 ERR swss#orchagent: :- handleCreate: Encountered failure in create operation, exiting orchagent, SAI API: SAI_API_SAMPLEPACKET, status: SAI_STATUS_INVALID_ATTRIBUTE_MAX

Check if STATE_DB is updated and cleared once the orchagent is restarted

root@mtbc-sonic-01-2410:/home/admin# sonic-db-cli STATE_DB GET ORCH_ABRT_STATUS
1
root@mtbc-sonic-01-2410:/home/admin#

root@mtbc-sonic-01-2410:/home/admin# sonic-db-cli STATE_DB GET ORCH_ABRT_STATUS

root@mtbc-sonic-01-2410:/home/admin#

Details if related

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
@vivekrnv
Copy link
Contributor Author

vivekrnv commented Dec 6, 2022

/azpw run Azure.sonic-swss

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-swss

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@@ -486,6 +486,9 @@ int main(int argc, char **argv)
DBConnector config_db("CONFIG_DB", 0);
DBConnector state_db("STATE_DB", 0);

/* Clears the ORCH_ABORT_STATUS flag in STATE_DB */
state_db.del(ORCH_ABRT);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@vivekrnv vivekrnv Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be done, but clearing it here gives more time buffer to the processes dependent on this flag.

@@ -23,6 +23,15 @@ extern ofstream gRecordOfs;
extern bool gLogRotate;
extern string gRecordFile;

void notifyAbort(){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API name is not aligned to the functionality. This is doing both notify and abort. API gives the indication that it is only notification. Please move abort to original code and use it only to notify or change the name to say notifyAndAbort

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will update

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled

Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
@vivekrnv vivekrnv requested a review from prsunny December 9, 2022 00:24
vivekrnv and others added 2 commits December 9, 2022 19:48
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
@vivekrnv
Copy link
Contributor Author

/azpw run Azure.sonic-swss

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-swss

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@vivekrnv
Copy link
Contributor Author

/azpw run Azure.sonic-swss

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-swss

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@vivekrnv
Copy link
Contributor Author

/azpw run Azure.sonic-swss

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-swss

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@vivekrnv vivekrnv closed this Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants