-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Leak With Go-Libp2p #2841
Comments
Just to be clear, you did not see this issue on a prior version of go-libp2p? Which version was that? |
This comment was marked as off-topic.
This comment was marked as off-topic.
Are you running If I try to repro this issue without doing However, if I call Here's my test code: |
Are you sure it's a memory leak? How many QUIC connections are you handling? An easy way to check is using the Swarm Prometheus dashboard.
|
Yeap, we have updated from
@marten-seemann Does very much appear to be on our end, my live node that was running has again had an increased heap from yesterday of about 50mb. At any one time there are only 15-20 active quic connections on my node. It has a maximum peer count of 70.
Swarm metrics from my node |
v0.34 and v0.35 are very similar. There was just a small breaking change. I wouldn't expect it to behave differently. |
Could you share repro steps? Maybe I can try with Kubo. Do you have a sense of the connection churn per hour? e.g. could you share the metrics of a node an hour apart? (or even 4 measurements 15 min apart) I wonder if you disconnected the node from the internet for a bit (1 minute?) and ran |
I can share the metrics in a bit, for Kubo just running this with both TCP and QUIC transports enabled for an extended period of time should show something. These are the options we currently use https://github.com/prysmaticlabs/prysm/blob/develop/beacon-chain/p2p/options.go#L63 |
Some connection metrics over 1 hour at 15 min intervals
|
In Prysm, we have support for running Prysm nodes both with the Quic transport and the TCP transport. Recently we updated our version to be v0.35.0. This has been working for the most part, however while profiling long-running nodes running we have come across what appears to be a memory leak. Over the course of a few weeks the heap occupied by a prysm node has grown rather than stayed constant size. If you look at the heap profiles below, you can see the large amount of memory allocated by the respective quic and tcp dialers.
The above 2 screenshots capture the heatmaps occupied by QUIC connections. If you take a look, it is sizeable amount occupied by the pre-setup step of quic connections:
I am not very familiar with the internals of quic-go but it appears these streams/connections are not being appropriately garbage collected by go-libp2p ? You have old connections somehow still living on the heap.
Finally we did a profile differential for the same node of its current state and its state a few days ago:
With this heatmap it becomes obvious where the memory leak is coming from, it does appear the new release has introduced either a regression or new bug for old connections.
Version Information
The text was updated successfully, but these errors were encountered: