Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix cluster startup race #5185

Merged
merged 7 commits into from
Aug 13, 2021
Merged

Conversation

Zetanova
Copy link
Contributor

@Zetanova Zetanova commented Aug 8, 2021

There is a race condition in ClusterDomainEventPublisher and ClusterCoreDaemon both used Cluster.Get(system) to get the cluster instance. If the ActorCells are activated before the Cluster constructer gets the answer in _clusterCore = GetClusterCoreRef().Result it will crash.

close #5184

@Aaronontheweb
Copy link
Member

This definitely introduced some bugs - we might not be able to safely do this. The the blocking wait in the constructor was intentional.

@Aaronontheweb
Copy link
Member

This definitely introduced some bugs - we might not be able to safely do this. The the blocking wait in the constructor was intentional.

Not to say that it's impossible to fix, but it may not be so straightforward

@Zetanova
Copy link
Contributor Author

Zetanova commented Aug 8, 2021

I didn't remove the lock.

The PR passes the Cluster instance to the ClusterCoreDaemon directly
To remove Cluster.Get(system) from the constructor.

@Zetanova
Copy link
Contributor Author

Zetanova commented Aug 8, 2021

I only tested it with RemotePingPong and it worked.

@Aaronontheweb
Copy link
Member

I didn't remove the lock.

The PR passes the Cluster instance to the ClusterCoreDaemon directly
To remove Cluster.Get(system) from the constructor.

Got it - I misunderstood. I'll take a closer look at this today.

@to11mtm
Copy link
Member

to11mtm commented Aug 11, 2021

@Zetanova I think you need to fix this line in ClusterDomainEventPublisherSpec.cs to get tests to pass:

//in public ClusterDomainEventPublisherSpec() : base(Config)
_publisher = Sys.ActorOf(Props.Create<ClusterDomainEventPublisher>()); //~line 66

Props.Create without a functor tries to look for a default constructor; due to your changes there is none here.

@Zetanova
Copy link
Contributor Author

@to11mtm thx

Why DDataStressTest.csproj failed to build at .net core unit test (linux) ?

@to11mtm
Copy link
Member

to11mtm commented Aug 13, 2021

@to11mtm thx

Happy to help!

Why DDataStressTest.csproj failed to build at .net core unit test (linux) ?

Not entirely sure on that one, although I know that @Aaronontheweb has been working through some challenges on LMDB native dlls, might be related so tagging him.

@Aaronontheweb
Copy link
Member

@Zetanova the DDataStressTest is a project we can possibly remove - we used it to help reproduce some data stemming from #5022

Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -583,7 +583,7 @@ public override int GetHashCode()
/// <summary>
/// Gets a reference to the cluster core daemon.
/// </summary>
internal class GetClusterCoreRef
internal class GetClusterCoreRef: INoSerializationVerificationNeeded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't start up Akka.Cluster when using custom task inlining dispatcher
3 participants