-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add async/await support to TestKit and migrate tests to Async api to reduce racy failures #4072
Add async/await support to TestKit and migrate tests to Async api to reduce racy failures #4072
Conversation
Some other racy spec issues may be closed if this PR will help. |
I have issues with tests that are using |
Now some tests are timing out. Locally I have interesting behavior: tests from Under Well, I might roll changes back or create separate PR with only UPDATE: It does not matter what tests are running under |
Need to make sure that I do not use |
Now here are the failures:
I was just going to push update with Async api for |
Now failures changed. MNTR under Windows:
Full Framework:
.NET Core (Windows):
Interesting that again all random failures are only under Windows. |
So, looks like this update does not completely resolve racy tests issues. So the challenge to fix all racy tests still exists, but this update should be helpful anyway. Let me remove |
Have also implemented async API for |
Have some thoughts on that...
Sounds like a good idea. |
Some thoughts on the racy unit tests - we've made huge efforts in the past to try to stamp these out, and heuristically there are a small number of things that can cause these flaky tests. First, the flip rate report for all failed tests in our suite going back 30 days - https://dev.azure.com/dotnet/Akka.NET/_test/analytics?definitionId=84&contextType=build You need to exclude the API approval specs, which dominate the top of this list, from the set of "racy" specs. These tests are doing their job when they fail - they're designed to alert the core development team when a public API change is being proposed so we don't accidentally sneak in a breaking change. Other than those specs, we have the following classes of specs which fail often, in descending order: 1 - What's killing us with our test suite is that it's large enough, concurrent enough, and dead-line sensitive enough that it doesn't take much to cause a single test to fail. I'm a little reluctant to pull in these changes right away since they don't seem to address some of the underlying causes of the failures, but we do want the async API for testing and we want to use it. The problem we need to solve initially is getting the suite to pass. I'm going to follow-up after lunch here with a decision tree for debugging and fixing these racy tests. |
To make async API available I can make separate PR that will just use those commits where I did API changes. And in this PR we can continue work on racy tests... Or just close it and start separate issue (if we do not have single one that aggregates the problem). |
That's a great idea - let's do that. |
cc @IgorFedchenko related: #3786 |
Yup, looks like after making separate PR with async API addition, need to check out that issue and try to make some PRs for it. Also, very interesting how did you fix some of them. |
@IgorFedchenko added my decision tree to here: #3786 (comment) |
Thanks, will take a look at this. In the meanwhile - here is my PR with only async API addition: #4075 . |
@Aaronontheweb So, there is no point in this PR anymore I guess? Core API updates were pushed to separate PR, and just updating most of the tests to async API does not solve the problem. If so, feel free to close it. Will do farther work on #3786 and that separate PR (if requires any). |
@IgorFedchenko changing some of the tests to use the async API, piecemeal, might help - but we should look at the tests one at a time. We would have never learned that without this PR though, so this work is definitely not wasted and we may even need it again. You did a great job here. |
This PR is pretty big from the changed files count point of view. But the core idea is very simple:
we are blocking execution thread with
Thread.Sleep
each time whenAwaitAssert
orAwaitCondition
is used. This is what #3854 is about. When lots of tests in multiple assemblies are running in parallel jobs on single machine, each test sometimes blocks execution thread. And there might be moments when too much threads are sleeping, which could lead to thread starvation, leak of free memory and so on.So the idea is that until all Akka.NET tests are using blocking testkit API, each test that uses blocking api may become racy, because OS is becoming slow,
Thread.Sleep
takes much more time to awake then requested, etc.Migration to async API should increase overall test performance pretty much, because this is the same as migration of ASP.NET Core server from blocking connections handling to async/await operations (in our case tests are the clients that taking resources from the system).
To make migration simpler, I marked sync method versions as obsolete (with compiler error enabled), and then fixed all compiler errors in solution. Well, almost all - there are MNTR tests, that are still using sync API right now (disabled compiler error for
Obsolete
methods), but I will update them later (and will add async specs handling to MNTR itself) if CI will show that there are less or none of racy failures after update.Before merging this PR, I think I will remove
Obsolete
attributes from sync API methods, because there is nothing that much wrong with them, and they are pretty handy sometimes. At the moment, they are marked asObsolete
for simpler detection.Close #3854
Close #3774