Add async/await support to TestKit and migrate tests to Async api to reduce racy failures #4072

IgorFedchenko · 2019-11-29T16:09:59Z

This PR is pretty big from the changed files count point of view. But the core idea is very simple:
we are blocking execution thread with Thread.Sleep each time when AwaitAssert or AwaitCondition is used. This is what #3854 is about. When lots of tests in multiple assemblies are running in parallel jobs on single machine, each test sometimes blocks execution thread. And there might be moments when too much threads are sleeping, which could lead to thread starvation, leak of free memory and so on.

So the idea is that until all Akka.NET tests are using blocking testkit API, each test that uses blocking api may become racy, because OS is becoming slow, Thread.Sleep takes much more time to awake then requested, etc.

Migration to async API should increase overall test performance pretty much, because this is the same as migration of ASP.NET Core server from blocking connections handling to async/await operations (in our case tests are the clients that taking resources from the system).

To make migration simpler, I marked sync method versions as obsolete (with compiler error enabled), and then fixed all compiler errors in solution. Well, almost all - there are MNTR tests, that are still using sync API right now (disabled compiler error for Obsolete methods), but I will update them later (and will add async specs handling to MNTR itself) if CI will show that there are less or none of racy failures after update.

Before merging this PR, I think I will remove Obsolete attributes from sync API methods, because there is nothing that much wrong with them, and they are pretty handy sometimes. At the moment, they are marked as Obsolete for simpler detection.

Close #3854
Close #3774

IgorFedchenko · 2019-11-29T16:15:24Z

Some other racy spec issues may be closed if this PR will help.

IgorFedchenko · 2019-11-29T20:05:24Z

I have issues with tests that are using EventFilter locally. Will try to fix this - ideally, all tests should be green in this PR (at least non-MNTR tests for now).

IgorFedchenko · 2019-11-30T08:58:12Z

Now some tests are timing out. Locally I have interesting behavior: tests from Akka.Persistence.Tests namespace are running well under debugger (they are running one-by-one in that case), but when running without debugger in parallel (like 8-9 concurrent executions) they are blocking forever (until test runner will shut them down).

Under dev branch all is running fine of course.

Well, I might roll changes back or create separate PR with only Async API implementation and try to resolve each racy test separately... But still want to make this work as a whole.

UPDATE: It does not matter what tests are running under Akka.Persistence.Test - when there are 8+ parallel test executions, it is handing forever.

IgorFedchenko · 2019-11-30T09:59:46Z

Need to make sure that I do not use AwaitSomethingAsync(...).Wait() instead of AwaitSomethingMethod(...) - this may cause deadlocks. Or consider using ConfigureAwait(false) for this scenarios, but using sync version should be cleaner.

…bsolete

IgorFedchenko · 2019-11-30T11:30:01Z

Now here are the failures:

On unit tests (only under Windows, both Full and Core frameworks):

Akka.Remote.Tests.RemotingSpec.Ask_does_not_deadlock , failed due to timeout while waiting for Result of Ask() task.

MNTR tests:

Akka.Remote.Tests.MultiNode.TransportFailSpec.TransportFail_should_reconnect on the "first" node

I was just going to push update with Async api for IEventFilterApplier interface - let's see if/what tests will fail now.

IgorFedchenko · 2019-12-02T13:47:01Z

Now failures changed.

MNTR under Windows:

A_restarted_quarantined_system_should_not_crash_the_other_system (both nodes)

Full Framework:

Akka.Remote.Tests.RemotingSpec.Ask_does_not_deadlock

.NET Core (Windows):

Akka.Persistence.Tests.AtLeastOnceDeliverySpec.AtLeastOnceDelivery_must_warn_about_unconfirmed_messages

Interesting that again all random failures are only under Windows.

IgorFedchenko · 2019-12-02T13:52:22Z

So, looks like this update does not completely resolve racy tests issues.
Well, Ask_does_not_deadlock test for example is not affected by this changes at all (it did not use any of the updated methods from test kit).

So the challenge to fix all racy tests still exists, but this update should be helpful anyway.

Let me remove Obsolete attributes from sync method versions, and PR will be ready for review.

IgorFedchenko · 2019-12-02T13:58:22Z

Have also implemented async API for IEventFilterApplier, and added some usages in tests to verify that they are working.
Did not update all sync calls in tests - @Aaronontheweb let me know if you think this would be helpful.

Aaronontheweb · 2019-12-02T17:37:01Z

So the challenge to fix all racy tests still exists, but this update should be helpful anyway.

Have some thoughts on that...

Let me remove Obsolete attributes from sync method versions, and PR will be ready for review.

Sounds like a good idea.

Aaronontheweb · 2019-12-02T18:24:13Z

Some thoughts on the racy unit tests - we've made huge efforts in the past to try to stamp these out, and heuristically there are a small number of things that can cause these flaky tests.

First, the flip rate report for all failed tests in our suite going back 30 days - https://dev.azure.com/dotnet/Akka.NET/_test/analytics?definitionId=84&contextType=build

You need to exclude the API approval specs, which dominate the top of this list, from the set of "racy" specs. These tests are doing their job when they fail - they're designed to alert the core development team when a public API change is being proposed so we don't accidentally sneak in a breaking change.

Other than those specs, we have the following classes of specs which fail often, in descending order:

1 - ClusterClient MNTR specs, the absolute worst offender.
2 - Akka.Tests.IO.TcpIntegrationSpec.Should_fail_writing_when_buffer_is_filled and Akka.Remote.Tests.RemotingSpec.Ask_does_not_deadlock - two racy I/O related specs, take up the number two seat.
3. Akka.Streams.Tests.Dsl.FlowSplitWhenSpec.SplitWhen_must_fail_substream_if_materialized_twice
4. A TON of Akka.Cluster.Sharding specs, including MNTR and normal specs
5. A hodge-podge of Akka.IO, persistence, and streams specs
6. Some MNTR specs
7. A ton of Akka.Streams and AtLeastOnceDelivery actor specs that all have 1-2 failures each.

What's killing us with our test suite is that it's large enough, concurrent enough, and dead-line sensitive enough that it doesn't take much to cause a single test to fail.

I'm a little reluctant to pull in these changes right away since they don't seem to address some of the underlying causes of the failures, but we do want the async API for testing and we want to use it.

The problem we need to solve initially is getting the suite to pass. I'm going to follow-up after lunch here with a decision tree for debugging and fixing these racy tests.

IgorFedchenko · 2019-12-02T18:58:29Z

@Aaronontheweb

I'm a little reluctant to pull in these changes right away since they don't seem to address some of the underlying causes of the failures, but we do want the async API for testing and we want to use it.

To make async API available I can make separate PR that will just use those commits where I did API changes. And in this PR we can continue work on racy tests... Or just close it and start separate issue (if we do not have single one that aggregates the problem).

Aaronontheweb · 2019-12-02T19:26:08Z

To make async API available I can make separate PR that will just use those commits where I did API changes. And in this PR we can continue work on racy tests... Or just close it

That's a great idea - let's do that.

Aaronontheweb · 2019-12-02T19:56:05Z

cc @IgorFedchenko related: #3786

IgorFedchenko · 2019-12-02T20:08:15Z

cc @IgorFedchenko related: #3786

Yup, looks like after making separate PR with async API addition, need to check out that issue and try to make some PRs for it. Also, very interesting how did you fix some of them.

Aaronontheweb · 2019-12-02T20:21:51Z

@IgorFedchenko added my decision tree to here: #3786 (comment)

IgorFedchenko · 2019-12-02T20:30:03Z

@IgorFedchenko added my decision tree to here: #3786 (comment)

Thanks, will take a look at this.

In the meanwhile - here is my PR with only async API addition: #4075 .

IgorFedchenko · 2019-12-02T20:35:32Z

@Aaronontheweb So, there is no point in this PR anymore I guess? Core API updates were pushed to separate PR, and just updating most of the tests to async API does not solve the problem.

If so, feel free to close it. Will do farther work on #3786 and that separate PR (if requires any).

Aaronontheweb · 2019-12-02T20:38:24Z

@IgorFedchenko changing some of the tests to use the async API, piecemeal, might help - but we should look at the tests one at a time. We would have never learned that without this PR though, so this work is definitely not wasted and we may even need it again. You did a great job here.

IgorFedchenko added 14 commits November 28, 2019 17:45

Added async overloads for basic Await* methods

474d186

Marked sync version of test methods Obsolete

9b4fcf0

Updated Akka.TestKit and marked public API Obsolete

3246a85

Updated Akka.Remote.Tests

8285551

Updated Akka.Tests

9a385cb

Updated Akka.TesKit.Tests

aa59ab5

Updated Akka.Cluster.Tests

1f425c2

Updated Akk.Remote.TestKit

9d45f81

Updated Akka.Persistence.Tests

554e622

Updated several test projects

93de0eb

Updated all non-MNTR tests

848adc1

Updated Akka.Streams.Tests (lots of work here)

39be9aa

Updated Akka.Remote.Tests.MultiNode (need MNTR support for Task)

37d2506

Allow AwaitAssert and AwaitConditions to make MNTR compile

33cdbc0

Fixed missing task awaiting

852b245

Fixed deadlock

f50975d

IgorFedchenko added 2 commits November 30, 2019 14:17

Allow using Obsolete EventFilter expectation methods

11753c2

Added Async overloads to IEventFilterApplier + marked sync versions o…

a3817dc

…bsolete

Added test usages for IEventFilterApplier async API

76b9cb3

Removed Obsolete attributes from sync API methods

b6fbc2b

IgorFedchenko mentioned this pull request Dec 2, 2019

Added async API to Akka.TestKit #4075

Merged

Aaronontheweb closed this Dec 2, 2019

IgorFedchenko deleted the async-await-in-testkit branch December 16, 2019 18:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add async/await support to TestKit and migrate tests to Async api to reduce racy failures #4072

Add async/await support to TestKit and migrate tests to Async api to reduce racy failures #4072

IgorFedchenko commented Nov 29, 2019 •

edited

Loading

IgorFedchenko commented Nov 29, 2019

IgorFedchenko commented Nov 29, 2019

IgorFedchenko commented Nov 30, 2019 •

edited

Loading

IgorFedchenko commented Nov 30, 2019 •

edited

Loading

IgorFedchenko commented Nov 30, 2019

IgorFedchenko commented Dec 2, 2019

IgorFedchenko commented Dec 2, 2019

IgorFedchenko commented Dec 2, 2019

Aaronontheweb commented Dec 2, 2019

Aaronontheweb commented Dec 2, 2019

IgorFedchenko commented Dec 2, 2019

Aaronontheweb commented Dec 2, 2019

Aaronontheweb commented Dec 2, 2019

IgorFedchenko commented Dec 2, 2019

Aaronontheweb commented Dec 2, 2019

IgorFedchenko commented Dec 2, 2019

IgorFedchenko commented Dec 2, 2019

Aaronontheweb commented Dec 2, 2019

Add async/await support to TestKit and migrate tests to Async api to reduce racy failures #4072

Add async/await support to TestKit and migrate tests to Async api to reduce racy failures #4072

Conversation

IgorFedchenko commented Nov 29, 2019 • edited Loading

IgorFedchenko commented Nov 29, 2019

IgorFedchenko commented Nov 29, 2019

IgorFedchenko commented Nov 30, 2019 • edited Loading

IgorFedchenko commented Nov 30, 2019 • edited Loading

IgorFedchenko commented Nov 30, 2019

IgorFedchenko commented Dec 2, 2019

IgorFedchenko commented Dec 2, 2019

IgorFedchenko commented Dec 2, 2019

Aaronontheweb commented Dec 2, 2019

Aaronontheweb commented Dec 2, 2019

IgorFedchenko commented Dec 2, 2019

Aaronontheweb commented Dec 2, 2019

Aaronontheweb commented Dec 2, 2019

IgorFedchenko commented Dec 2, 2019

Aaronontheweb commented Dec 2, 2019

IgorFedchenko commented Dec 2, 2019

IgorFedchenko commented Dec 2, 2019

Aaronontheweb commented Dec 2, 2019

IgorFedchenko commented Nov 29, 2019 •

edited

Loading

IgorFedchenko commented Nov 30, 2019 •

edited

Loading

IgorFedchenko commented Nov 30, 2019 •

edited

Loading