-
Notifications
You must be signed in to change notification settings - Fork 770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pytest] Ignore specific syslog error messages when testing autorestart #2430
[pytest] Ignore specific syslog error messages when testing autorestart #2430
Conversation
…feature and do the postcheck to make sure all the critical processes are alive after testing. Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
# Sleep 20 seconds such that containers have a chance to start. | ||
time.sleep(20) | ||
processes_status = duthost.all_critical_process_status() | ||
for container_name, processes in processes_status.items(): | ||
if processes["status"] is False or len(processes["exited_critical_process"]) > 0: | ||
pytest.fail("Critical process(es) was not running after container '{}' was restarted.".format(container_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please convert this block into pytest_assert(wait_until(...), failure message)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion! Fixed.
ignoreRegex = [ | ||
".*ERR monit.*", | ||
".*ERR swss#orchagent.*removeLag.*", | ||
".*ERR pmon#xcvrd.*initializeGlobalConfig.*", | ||
".*ERR pmon#thermalctld.*Caught exception while initializing thermal manager.*", | ||
".*ERR syncd#syncd.*driverEgressMemoryUpdate.*", | ||
".*ERR syncd#syncd.*brcm_sai*", | ||
".*ERR teamd#teamsyncd.*readData.*netlink reports an error=-33 on reading a netlink socket.*", | ||
".*WARNING syncd#syncd.*saiDiscover: skipping since it cause crash.*", | ||
".*ERR systemd.*Failed to start .* container*", | ||
".*ERR kernel.*PortChannel.*", | ||
".*ERR route_check.*", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please split this regex array to one array for each service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great Suggestion! Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of splitting the regex into groups was so that we can ignore only a subset of messages for each container being tested. You split them into groups, but you then you simply add all the groups in unconditionally, which results in the same behavior as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jleveque The idea is to split the ignore regex into subgroups and each subgroup represents a container. When a container was tested for example syncd, this function will be called such that the messages in subgroup of syncd will be ignored, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Joe, Thanks for clarification!
pytest_assert(...) to wait containers are all restarted before doing the postcheck. Signed-off-by: Yong Zhao <yozhao@microsoft.com>
loganalyzer.ignore_regex.extend(teamd_ignoreRegex) | ||
loganalyzer.ignore_regex.extend(systemd_ignoreRegex) | ||
loganalyzer.ignore_regex.extend(kernel_ignoreRegex) | ||
loganalyzer.ignore_regex.extend(other_ignoreRegex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yozhao101 this is not exactly what I had in mind when asked for splitting. But I understand that you have to put all regex up because you are looping through all services and run test against them. you currently cannot really apply individual regex for individual service.
I'll approve this PR and let it go in. Please open a separate PR to parameterize the test so that the test will run for each service individually, that way you can apply ignore regex per service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I misunderstood what you ware meaning in our meeting. I will create a separate PR to improve the ignore regex.
…rt (#2430) When pytest script was ran to test the autorestart feature, there will be error messages in the syslog which will cause the logAnalyzer to fail at the end of testing. In order to fix this issue, I examine all the error messages in the syslog during testing the autorestart feature and add them into whitelist. At the same time, the reason why we should ignore them are put in the comment of `ignore_expected_loganalyzer_exception(...)` function. After adding these error messages into whitelist, we need do the post check to see whether all the critical processes are alive after testing the autorestart feature. Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao yozhao@microsoft.com
Description of PR
Summary:
When pytest script was ran to test the autorestart feature, there will be error messages in the syslog which will cause
the logAnalyzer to fail at the end of testing. In order to fix this issue, I examine all the error messages in the syslog
during testing the autorestart feature and add them into whitelist. At the same time, the reason why we should ignore
them are put in the comment of
ignore_expected_loganalyzer_exception(...)
function.After adding these error messages into whitelist, we need do the post check to see whether all the critical processes
are alive after testing the autorestart feature.
Fixes # (issue)
Type of change
Approach
What is the motivation for this PR?
How did you do it?
I added two more functions in pytest script. One function is used to add the error messages which can be ignored into
whitelist. Another is used to do the post check to see whether all the critical processes are alive.
How did you verify/test it?
I tested this pytest script against a physical testbed:
str-dx010-acs-1.
Since currentlyacms
process in ACMS containerwas not running by default until the certificate are deployed, pytest script will fail when testing the ACMS container.
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation