Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[memory_checker] Change log severity to WARNING in a case when the docker service is not running. #27

Closed

Conversation

vadymhlushko-mlnx
Copy link
Owner

@vadymhlushko-mlnx vadymhlushko-mlnx commented Jul 26, 2023

Why I did it

To fix the logic introduced by [memory_checker] Do not check memory usage of containers which are not created #11129.
There could be a scenario before the reboot, where

  1. The docker service has stopped
  2. In a very short period of time, the monit service performs the root@sonic:/home/admin# monit status container_memory_telemetry

In such scenario, the memory_checker script will throw an error to the syslog:

ERR memory_checker: Failed to retrieve the running container list from docker daemon! Error message is: 'Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))'

But, actually, this scenario is a correct behavior, because when the docker service is stopped, the Unix socket is destroyed and that is why we could see the FileNotFoundError(2, 'No such file or directory' exception in the syslog.

Work item tracking
  • Microsoft ADO (number only):

How I did it

Change the log severity to the warning and changed the return value.

How to verify it

It is really hard to catch the exact moment described in the Why I did it section.
In order to check the logic:

  1. Change the Unix socket path to non-existing in /usr/bin/memory_checker file on the switch.
  2. Execute the root@sonic:/home/admin# monit restart container_memory_telemetry
  3. Check the syslog for such messages:
WARNING memory_checker: Failed to retrieve the running container list from docker daemon! Error message is: 'Error while fetching server API version: ('Connection aborte
d.', FileNotFoundError(2, 'No such file or directory'))'

INFO memory_checker: [memory_checker] Exits without checking memory usage since container 'telemetry' is not running!

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

…cker service is not running.

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants