-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testnet crash under load #3692
Comments
I will be stating something obvious here - It seems we have a memory leak issue. 🤔 |
Memory did not increase linearly but exploded suddenly, like the other time |
Logs of servers 1,2,3,4: https://we.tl/t-hFT8JtAM7I |
Good news: deadlock detection was running on all servers at log level Warn |
Notes:
|
Will it be wrong to think in this way?
As I am not well versed in the codebase yet, maybe someone can help me validate this hypothesis? |
From the graph, it seems that something abnormal happens on the Network Traffic and 5 minutes later the node crashes. |
It seems that there is a reception and a propagation in the network of a abnormal amount of data. This data filled all available RAM and caused a pick of CPU usage. This scenario should be handled as it's basic, but a protection doesn't worked as expected ? |
@dr-chain @aoudiamoncef in this case, CPU goes up before any abnormal network activity |
@damip Interesting. Is this observation consistent with all nodes? I mean do we have the same results on all nodes? If the node in the screenshot is the block creator of one of the last blocks then we can potentially reduce our investigation radius. |
We don't reproduce this behavior since TEST.21. Now the memory is stable except : #3803 |
2023-03-13T00:00:01.249244Z
(including all info from system logs, dmesg etc...), probably a problem with the logrotate. Can't find when the crash happened and no info about deadlock detection2023-03-19T19:46:53.928940Z
(log level info, deadlock detection on) => OOM, no deadlock2023-03-19T19:41:52.370952Z
(log level info, deadlock detection on) => OOM, no deadlock2023-03-19T19:33:38.922144Z
(log level info, deadlock detection on) => OOM, no deadlockJob-log
Findings so far
Open questions
Answered questions:
PRs included in testnet 20
sha256_hash
ABI #3498Less-likelies?
Data
Logs of servers 1,2,3,4
OOM message from testnet3:
The text was updated successfully, but these errors were encountered: