Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Akka.DistributedData: memory leak when recovering events from LMDB data store #5022

Closed
Aaronontheweb opened this issue May 19, 2021 · 6 comments · Fixed by #5056
Closed

Akka.DistributedData: memory leak when recovering events from LMDB data store #5022

Aaronontheweb opened this issue May 19, 2021 · 6 comments · Fixed by #5056
Labels
akka-ddata akka-ddata-durable LMDB implementation for persisting durable data confirmed bug
Milestone

Comments

@Aaronontheweb
Copy link
Member

Version: Akka.NET v1.4.19

Reproduction: https://github.com/andyfurnival/ddata

Memory consumption grows steadily in a 2 node cluster, with both nodes recovering events from their own LMDB data stores:

image

The drivers of memory consumption appear to be non-stop gossip between the replicators:

image

I think there must be an equality check that is failing somewhere in this system, causing the same objects to be gossiped over and over again. We're not even producing any new events other than what's been deserialized from LMDB in this sample and yet the memory grows continuously.

@Aaronontheweb Aaronontheweb added confirmed bug akka-ddata akka-ddata-durable LMDB implementation for persisting durable data labels May 19, 2021
@Aaronontheweb Aaronontheweb added this to the 1.4.21 milestone May 19, 2021
@Aaronontheweb
Copy link
Member Author

Based on the logs, looks like this system is trying to continuously prune the data envelopes loaded from storage:

[DEBUG][5/19/2021 8:07:20 PM][Thread 0007][akka.tcp://test@desktop-1go6og2:18001/user/ddataReplicator] Initiating pruning of UniqueAddress: (akka.tcp://test@eng07-bet-mgmt-7.engineering.nonprod.sharpgaming.net:4256, 326986514) with data Event-75016
[DEBUG][5/19/2021 8:07:20 PM][Thread 0007][akka.tcp://test@desktop-1go6og2:18001/user/ddataReplicator] Initiating pruning of UniqueAddress: (akka.tcp://test@eng07-bet-mgmt-4.engineering.nonprod.sharpgaming.net:4256, 501230574) with data Event-75016
[DEBUG][5/19/2021 8:07:20 PM][Thread 0007][akka.tcp://test@desktop-1go6og2:18001/user/ddataReplicator] Initiating pruning of UniqueAddress: (akka.tcp://test@eng07-bet-mgmt-7.engineering.nonprod.sharpgaming.net:4256, 636684864) with data Event-75016
[DEBUG][5/19/2021 8:07:20 PM][Thread 0007][akka.tcp://test@desktop-1go6og2:18001/user/ddataReplicator] Initiating pruning of UniqueAddress: (akka.tcp://test@eng07-bet-mgmt-3.engineering.nonprod.sharpgaming.net:4256, 837357637) with data Event-75016
[DEBUG][5/19/2021 8:07:20 PM][Thread 0007][akka.tcp://test@desktop-1go6og2:18001/user/ddataReplicator] Initiating pruning of UniqueAddress: (akka.tcp://test@eng07-bet-mgmt-4.engineering.nonprod.sharpgaming.net:4256, 32551767) with data Event-74994
[DEBUG][5/19/2021 8:07:20 PM][Thread 0007][akka.tcp://test@desktop-1go6og2:18001/user/ddataReplicator] Initiating pruning of UniqueAddress: (akka.tcp://test@eng07-bet-mgmt-7.engineering.nonprod.sharpgaming.net:4256, 326986514) with data Event-74994
[DEBUG][5/19/2021 8:07:20 PM][Thread 0007][akka.tcp://test@desktop-1go6og2:18001/user/ddataReplicator] Initiating pruning of UniqueAddress: (akka.tcp://test@eng07-bet-mgmt-4.engineering.nonprod.sharpgaming.net:4256, 501230574) with data Event-74994

This might be due to the fact that the node we originally recovered this data from no longer exists (indeed, that would likely be the case in the event of any ActorSystem restards since the Uid on the UniqueAddress will change) - I'll start by looking there.

@Aaronontheweb
Copy link
Member Author

_log.Debug("Initiating pruning of {0} with data {1}", removed, key);
SetData(key, newEnvelope);

This is where the log statements are coming from, but it looks like the pruning is happening over and over again with no success - so the issue might be persisting that event while pruned doesn't overwrite the original.

@Aaronontheweb
Copy link
Member Author

Also worth noting that we're attempting to prune many different unique addresses for the same Address in those logs - the Uids are different but it's all originally from the same node.

@andyfurnival
Copy link
Contributor

Are these improvements that have been done so far in a nightly build I can try?

@ismaelhamed
Copy link
Member

Here's how to get the nightly builds: https://getakka.net/community/getting-access-to-nightly-builds.html . Anything merged into dev should be available the same night.

@Arkatufus
Copy link
Contributor

Arkatufus commented May 27, 2021

Initial observations:

  • Memory usage spiked with DistributedData cluster

Possible causes:

  • Memory leaks or non-optimized memory allocation. (not the cause)
    • String allocation from several places
      • MurmurHash.
      • Old style string.Format.
      • Address constructor
    • Byte array allocation during serialization.
  • Gossip and Status isn't sent with proper address UID (not the cause)
    • Make from and to UID in gossip and status construtor mandatory
    • Change Replicator to use UniqueAddress exclusively
  • Possible faulty VersionVector Comparer logic (not the cause)

Further observations:

  • Mutiple gossips are sent right after pruning is initialized and removed path count is above 0
  • Gossips were retained in the mailbox, bloating memory usage
  • Problem is compounded in a system with a lot of pruning, since all of the pruning data are also serialized with each gossip round

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
akka-ddata akka-ddata-durable LMDB implementation for persisting durable data confirmed bug
Projects
None yet
4 participants