-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DNM] Debug freeze on CentOS 7 CI #2939
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Not sure what to do here. Add more iterations? Increase the timeout? |
kolyshkin
force-pushed
the
debug-freeze
branch
2 times, most recently
from
May 5, 2021 22:22
8683ae4
to
a59d45a
Compare
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
1. These tests can't be run in parallel since they do check a global variable (mbaScEnabled). 2. findIntelRdtMountpointDir() relies on mbaScEnabled to be initially set to the default value (false) and this the test fails if run more than once: > go test -count 2 > ... > intelrdt_test.go:243: expected mbaScEnabled=false, got true > --- FAIL: TestFindIntelRdtMountpointDir/Valid_mountinfo_with_MBA_Software_Controller_disabled (0.00s) Fixes: 2c70d23 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin
force-pushed
the
debug-freeze
branch
5 times, most recently
from
May 6, 2021 01:05
ca1340e
to
92c8cb1
Compare
500x each test (with and without systemd). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
I hate to keep adding those kludges, but lately TestFreeze (and TestSystemdFreeze) from libcontainer/integration fails a lot. The failure comes and goes, and is probably this is caused by a slow host allocated for the test, and a slow VM on top of it. To remediate, add a small sleep on every 25th iteration in between asking the kernel to freeze and checking its status. In the worst case scenario (failure to freeze) this adds 0.4 μs to the duration of the call (nothing compared to that sleep after the temporary thaw). It is hard to measure how this affects CI but (with added debug prints) on a histogram of number of retries I saw peaks at and after numbers 25, 50, 75 etc. meaning this works. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
OK, the conclusion is adding an occasional short delay between writing "frozen" and reading the status back helps for this case (very slow system). |
The fix is #2941 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
GHA CI almost always fails on CentOS 7 (#2907):
Trying to find out what to do about it.
This is complicated because the kind of mac os x host GHA gives for the test is a lottery. In most cases it's good, and sometimes it's slow and buggy.