Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

akka.cluster.sharding - storage api update #4629

Closed
wants to merge 74 commits into from

Conversation

zbynek001
Copy link
Contributor

@zbynek001 zbynek001 commented Nov 12, 2020

Another big one ^^
This MR is updating sharding to current scala version, including unit tests and multinode tests.

What's still missing compared to scala:

  • ExternalShardAllocation
  • RemoveInternalClusterShardingData app

There were quite many changes related to the storage api change, so basically some parts were completely rewritten. Especially Shard and ShardCoordinator.

Also includes some pre-requirements to make some part easier, especially MessageBuffer and Dropped deadletter. Not sure if you'll like it.

Tests are now using sqlite persistence provider
Tests are now using shared inmem persistence providers for both journal & snapshots to avoid issues with files

In theory, everything "should" be backwards compatible

Additionally included in this pr:

@zbynek001 zbynek001 force-pushed the sharding-update2 branch 6 times, most recently from dafef2f to 05a658c Compare November 14, 2020 21:57
@zbynek001 zbynek001 changed the title [WIP] akka.cluster.sharding - storage api update akka.cluster.sharding - storage api update Nov 18, 2020
@zbynek001
Copy link
Contributor Author

zbynek001 commented Nov 18, 2020

This should be ready to review.
Lots of changes, but it brings sharding up-to date with latest scala version.
Persistent mode was split into two parts:

  • CoordinatorStore
  • ShardStore

which can use different persistent mode configured via state-store-mode & remember-entities-store.
Possible combinations:

state-store-mode remember-entities-store CoordinatorStore mode ShardStore mode
persistence (default) - (ignored) persistence persistence
ddata ddata ddata ddata
ddata eventsourced (new) ddata persistence

There should be no breaking changes from user perspective. Only some internal messages/objects were moved.
There should be no change in the PersistentId behavior and default persistent configuration (akka.cluster.sharding.state-store-mode)

@Aaronontheweb Aaronontheweb self-requested a review November 18, 2020 17:42
* make use of the existing logging of dead letter
  also for UnhandledMessage

Supress ActorSelectionMessage with DeadLetterSuppression (migrated from akka/akka#28341)
* for example the Cluster InitJoin message is marked with DeadLetterSuppression
  but was anyway logged because sent with actorSelection
* for other WrappedMessage than ActorSelectionMessage we shouldn't unwrap and publish
  the inner in SuppressedDeadLetter because that might loose some information
* therefore those are silenced in the DeadLetterListener instead

Better deadLetter logging of wrapped messages (migrated from akka/akka#28253)
Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it through 38/104 files - going to work on the rest tomorrow. Liking what I'm seeing so far but it's only been a review of the tests.


namespace Akka.Cluster.Sharding.Tests
{
public class ClusterShardCoordinatorDowningSpecConfig : MultiNodeClusterShardingConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Liking the common base class for testing sharding. Makes a lot of sense that we add that.

}
# don't leak ddata state across runs
akka.cluster.sharding.distributed-data.durable.keys = []
akka.persistence.journal.leveldb-shared.store.native = off
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a JVM setting? Pretty sure we don't have any LevelDb support.

EnterBarrier("received failed stats from timed out shards vs empty");
}

private void Querying_cluster_sharding_must_return_shard_state_of_sharding_regions_if_one_or_more_shards_timeout_versus_all_as_empty()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - I've run into this issue before with pbm cluster-sharding region stats -r {regionName} before; glad this has been fixed.

""");
}

akka.persistence.snapshot-store.plugin = ""akka.persistence.memory-snapshot-store-shared""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}

[Fact]
public void ClusterShardingMessageSerializer_must_serialize_ShardRegionQuery()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - all of the new tests in this class look fine, but I haven't checked the underlying serializer impl itself which is where I know @Arkatufus raised some points a while back.

@Aaronontheweb
Copy link
Member

Akka.Cluster.Sharding.Tests.RememberEntitiesShardIdExtractorChangeSpec.Sharding_with_remember_entities_enabled_should_allow_a_change_to_the_shard_id_extractor

some of these specs are failing - we'll investigate.

Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

65/104

@Aaronontheweb Aaronontheweb modified the milestones: 1.4.33, 1.4.34 Feb 14, 2022
Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

70/104

private readonly bool _rememberEntities;
private sealed class RememberEntitiesStoreStopped
{
public static RememberEntitiesStoreStopped Instance = new RememberEntitiesStoreStopped();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to make this readonly (nitpick)

replicator,
majorityMinCap,
rememberEntitiesStoreProvider))
.WithDeploy(Deploy.Local);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'll make sure that we don't end up trying to serialize this actor when serialize-creators=on - good catch

private bool _terminating = false;
private readonly UniqueAddress _selfUniqueAddress;
private readonly LWWRegisterKey<ShardCoordinator.CoordinatorState> _coordinatorStateKey;
private ImmutableHashSet<(IActorRef, GetShardHome)> _getShardHomeRequests = ImmutableHashSet<(IActorRef, GetShardHome)>.Empty;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does GetShardHome contain a "by value" GetHashCode implementation? Otherwise this might be problematic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we do

public override int GetHashCode()
{
unchecked
{
return Shard?.GetHashCode() ?? 0;
}
}

case GetFailure m when m.Key.Equals(_coordinatorStateKey):
_initialStateRetries++;
var template =
"{0}: The ShardCoordinator was unable to get an initial state within 'waiting-for-state-timeout': {1} millis (retrying). Has ClusterSharding been started on all nodes?";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compiler might do this on the fly anyway but might as well declare a const here


namespace Akka.Cluster.Sharding.External
{
public class ClientTimeoutException : Exception
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add some XML-DOC comments here to explain how / why this error is raised

/// <summary>
/// API May Change
/// </summary>
public sealed class ExternalShardAllocation : IExtension
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public sealed class ExternalShardAllocation : IExtension
[ApiMayChange]
public sealed class ExternalShardAllocation : IExtension

using System;
using System.Collections.Concurrent;
using Akka.Actor;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
using Akka.Annotations;

}

/// <summary>
/// uses a string primitive types are optimized in ddata to not serialize every entity separately
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return new LWWDictionaryKey<ShardId, string>($"external-sharding-{typeName}");
}

private class DDataStateActor : ActorBase, IWithUnboundedStash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd appreciate an internal comment here explaining this actor's role in the design, as that may not be obvious to the next person who has to work on this

{
using ShardId = String;

public class ExternalShardAllocationStrategy : IStartableAllocationStrategy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be sealed so we don't end up with multiple GetShardLocation implementations?

@Aaronontheweb
Copy link
Member

Still going through this file by file, but I've left another review

Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the Akka.Annotations usage here

{
using ShardId = String;

public sealed class ShardLocations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to double check the protobuf compatibility, but otherwise this looks fine - I'm on the fence about doing this in v1.4. Might need to pull it in as part of v1.5.

{
_system = system;
_typeName = typeName;
_log = Logging.GetLogger(system, GetType());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might not be a bad idea to enrich this logger name with the _typeName

using ShardId = String;

/// <summary>
/// Only intended for testing, not an extension point.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thank you

_replicator.Tell(Dsl.Get(_allShardsKey, _readMajority));
}

protected override bool Receive(object message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

/// By splitting the elements over 5 keys we can support 10000 entities per shard.
/// The Gossip message size of 5 ORSet with 2000 ids is around 200 KiB.
/// This is by intention not configurable because it's important to have the same
/// configuration on each node.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to benchmark our DData payload size to make sure that this is true in Akka.NET as well

WrittenMigrationMarker = writtenMigrationMarker;
}

public IImmutableSet<string> Shards { get; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably also be typed ShardId

{
return new ShardsQueryResult<T>(
failed: ps.Where(i => i.task.IsCanceled || i.task.IsFaulted).Select(i => i.shard).ToImmutableHashSet(),
responses: ps.Where(i => !i.task.IsCanceled && !i.task.IsFaulted).Select(i => i.task.Result).ToImmutableList(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering about the Task.Result call here - I'll need to use the IDE for this, but I'd prefer it if we just passed the results of the task into this method rather than the Task themselves. Difficult to tell from the PR review dialog.

@@ -21,6 +21,28 @@ namespace Akka.TestKit.Xunit2
/// </summary>
public class TestKit : TestKitBase , IDisposable
{
private class PrefixedOutput : ITestOutputHelper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great idea.

# When 'remember-entities' is enabled and the state store mode is ddata this controls
# how the remembered entities and shards are stored. Possible values are "eventsourced" and "ddata"
# Default is ddata for backwards compatibility.
remember-entities-store = "ddata"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense

src/protobuf/ClusterShardingMessages.proto Show resolved Hide resolved
}

message ShardRegionWithAddress{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough about Protobuf here to say for certain if renaming a .proto file is a breaking change or not. I'm going to need to do some manual testing with these changes anyway.

@Aaronontheweb Aaronontheweb modified the milestones: 1.4.34, 1.4.35 Mar 7, 2022
@Aaronontheweb Aaronontheweb modified the milestones: 1.4.35, 1.4.36 Mar 18, 2022
@Aaronontheweb
Copy link
Member

Merged via #5857

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants