Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[xcvrd] Force cleanup of chassis global variable on deinit #193

Merged
merged 2 commits into from
Jun 16, 2021

Conversation

alexrallen
Copy link
Contributor

Description

I added a del directive for the global platform_chassis object in xcvrd on deinit to ensure that the pointer is properly cleaned up such that the __del__() method is successfully called on the chassis object which closes its connection to SAI.

Motivation and Context

On Mellanox platforms on 202012 we are seeing that in some cases xcvrd is maintaining the socket to the API past its deconstruction causing an error from SAI. This was traced by reproducing the issue and identifying that the pmon container was causing this issue where xcvrd is the only daemon within pmon that opens a SDK socket on Mellanox.

Error in log...

syncd#SDK: [SX_API_INTERNAL.ERR] Failed command read at communication channel: Connection reset by peer

How Has This Been Tested?

Change was uploaded to xcvrd on running pmon container on a switch running the latest 202012 build. xcvrd was then restarted with supervisord and we verified that we could no longer reproduce the bug.

Backporting

Backport to 202012 while this is a necessary fix in general we are seeing symptoms specifically on 202012

@lgtm-com
Copy link

lgtm-com bot commented Jun 10, 2021

This pull request introduces 1 alert when merging c017e51 into bf60a27 - view on LGTM.com

new alerts:

  • 1 for Unnecessary delete statement in function

liat-grozovik
liat-grozovik previously approved these changes Jun 13, 2021
@liat-grozovik
Copy link
Collaborator

regarding the new alert from LGTM @robocoder99 as the fix is to delete the pointers, not sure why LGTM raise a concern about this specific line which is the whole reason for this PR. Maybe something is missing. Can you please check it out?

@liat-grozovik
Copy link
Collaborator

@keboliu and @jleveque kindly reminder to review this enhancement .

@jleveque
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@liat-grozovik liat-grozovik merged commit eb8a223 into sonic-net:master Jun 16, 2021
qiluo-msft pushed a commit that referenced this pull request Jun 21, 2021
- Description
I added a del directive for the global platform_chassis object in xcvrd on deinit to ensure that the pointer is properly cleaned up such that the __del__() method is successfully called on the chassis object which closes its connection to SAI.

- Motivation and Context
On Mellanox platforms on 202012 we are seeing that in some cases xcvrd is maintaining the socket to the API past its deconstruction causing an error from SAI. This was traced by reproducing the issue and identifying that the pmon container was causing this issue where xcvrd is the only daemon within pmon that opens a SDK socket on Mellanox.

Error in log...

syncd#SDK: [SX_API_INTERNAL.ERR] Failed command read at communication channel: Connection reset by peer

- How Has This Been Tested?
Change was uploaded to xcvrd on running pmon container on a switch running the latest 202012 build. xcvrd was then restarted with supervisord and we verified that we could no longer reproduce the bug.
andywongarista pushed a commit to andywongarista/sonic-platform-daemons that referenced this pull request Jun 30, 2021
…#193)

- Description
I added a del directive for the global platform_chassis object in xcvrd on deinit to ensure that the pointer is properly cleaned up such that the __del__() method is successfully called on the chassis object which closes its connection to SAI.

- Motivation and Context
On Mellanox platforms on 202012 we are seeing that in some cases xcvrd is maintaining the socket to the API past its deconstruction causing an error from SAI. This was traced by reproducing the issue and identifying that the pmon container was causing this issue where xcvrd is the only daemon within pmon that opens a SDK socket on Mellanox.

Error in log...

syncd#SDK: [SX_API_INTERNAL.ERR] Failed command read at communication channel: Connection reset by peer

- How Has This Been Tested?
Change was uploaded to xcvrd on running pmon container on a switch running the latest 202012 build. xcvrd was then restarted with supervisord and we verified that we could no longer reproduce the bug.
vdahiya12 pushed a commit to vdahiya12/sonic-platform-daemons that referenced this pull request Apr 4, 2022
…rn (sonic-net#193)

Remove EEPROM cache file and use DB instead

1. Use visitor pattern accessing EEPROM data
2. Provide utility functions to access redis data base
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants