[BUG] Major memory leak on 1.2.4/1.2.5 #8307

denisprovost · 2021-09-02T16:09:25Z

Description
After chia synced and start farming, memory increased. 3hours -> +10%

After a few hours :

After to have stop chia process :

OS: Ubuntu 21.04 on Pi4 (full node + haverster)
RAM: 8 GB
Chia version: 1.2.4/1.2.5

denisprovost · 2021-09-02T16:11:39Z

Right now I'm going back to 1.2.3 and going to check if the problem is with my system or chia process. I will see in a few hours.

ALTracer · 2021-09-02T16:36:22Z

I noticed a memory leak in chia_full_node, too. One of the four process allocated a whopping 5350 MiB as opposed to 600 MiB.

OS: Gentoo 17.1 stable amd64 on custom desktop
RAM: 16 GiB -2 (AMD 3400G APU) +zram
Chia version: 1.2.5

Will post debug-enabled existing logs on request.

denisprovost · 2021-09-02T16:39:07Z

ok, happy to see what the problem is not single

avsync · 2021-09-02T18:40:14Z

I'm seeing this leak too on 1.2.4 and 1.2.5!

erickoh · 2021-09-02T19:07:24Z

I'm having this memory leak issue on 1.1.7 since the past week too. I'm on ubuntu 20.04
Currently I am seeing a chia_full_node process taking up 2.5g of RES memory.
Memory usage continues to creep up and eventually it crashes and becomes a defunct process

I tried upgrading some of my full-nodes to 1.2.5, but still having these memory leaks and eventual crash

Rigidity · 2021-09-02T19:22:50Z

Can confirm this memory leak on Ubuntu on all versions I have used, 1.2.3-1.2.5

denisprovost · 2021-09-02T19:34:43Z

Has it been reported before?

emlowe · 2021-09-02T21:46:22Z

We are actively researching this

denisprovost · 2021-09-02T21:58:57Z

I have moved on 1.2.3 for check, same issue

Jacek-ghub · 2021-09-03T01:27:58Z

@emlowe

We are actively researching this

Uhm, what is the ETA?

Rigidity · 2021-09-03T02:42:11Z

@emlowe

We are actively researching this

Uhm, what is the ETA?

I would imagine an issue like this is a priority, so whenever they can get a hot fix out.

Jacek-ghub · 2021-09-03T02:52:40Z

I would imagine an issue like this is a priority

I would also like to imagine that this would be the case.

However, if no ETA is provided (for any issue, not just this), than it is just a BS to get people focused on something else. Sorry, but how things/issues are being handled is not how software company runs.

Rigidity · 2021-09-03T02:57:37Z

I would imagine an issue like this is a priority

I would also like to imagine that this would be the case.

However, if no ETA is provided (for any issue, not just this), than it is just a BS to get people focused on something else. Sorry, but how things/issues are being handled is not how software company runs.

You're welcome to help fix the issue yourself, by submitting a pull request to this repo, but other wise you'll likely have to wait until the next patch version. They have said that they found what the problem is and are actively working on fixing it. If that's not good enough, I don't know what is. Why would they not solve a problem with their blockchain just to make people "focused on something else" when that would be detrimental to the network? Have patience...

denisprovost · 2021-09-03T03:06:38Z

That this leak memory has existed for a few versions is not reassuring. But yes let's leave time to find a solution and patch :)

Jacek-ghub · 2021-09-03T03:14:35Z

You're welcome to help fix the issue yourself

Works both ways. Should I say that I am waiting for you to join the pull request efforts and will work with you on that? What is the point of it?

We are both customers of Chia company. We purchased drives, plotters, do everything possible to help the ecosystem. There is no need to point fingers at people to do stuff, when the company is not doing their part.

As @denisprovost stated "That this leak memory has existed for a few versions is not reassuring." Where was the QA that let 1.2.4 go out the door? Where was the QA to rush a broken 1.2.5 release. Why are we treated like alpha testers? So, are you really thinking that 1.2.6 will be "the working one?"

Again, if all that you said is true, then what is the problem to say that "we will have it hopefully by Monday (or whatever makes sense)?" That is how ETA works, that you provide some timeline for people to calm down, schedule their time. I had other issues where the guy was just mudding the water to get by with other people reports.

Again, providing the ETA is not a big deal, it doesn't compromise anything, it just let people better manage their time. Nothing more than that. Otherwise, it is just hurting the ecosystem that we all try to support.

Jacek-ghub · 2021-09-03T03:55:37Z

I am on Windows, and don't think that the problem exists on Windows. I run full-node 24/7, and don't see any crashes. Although, I am still on 1.2.3 (saw problems other people had with 1.2.4, then rushed 1.2.5 that was as good as 1.2.4, and decided to wait).

It looks like the problem is more related to your setup (libs) than to OS or Chia version. I guess, it would be more people in this thread, if that issue would affect Ubuntu with a particular version, or some Chia version(s).

Is it possible that all of you run some recent OS/library updates (and got for instance new python libs) that are causing those issues for all Chia releases?

denisprovost · 2021-09-03T04:16:41Z

It's a fairly classic answer, you can imagine that before posting I checked, made tests etc and I bring results . I sincerely hope that the chia team will not see the problem from this angle. I am not the only one with problems like this. It may be my system, but it may not be.

A stable system does not drift on its own

#3366
#3209
#2055

Annonced fixed but no ;)

Rigidity · 2021-09-03T04:27:36Z

This memory leak has happened to everyone I know who uses Ubuntu.

djails · 2021-09-03T04:28:30Z

This memory leak has happened to everyone I know who uses Ubuntu.

I'll add my +1, it's happening to me as well.

Jacek-ghub · 2021-09-03T05:42:02Z

@denisprovost

you can imagine that before posting I checked

Don't take me wrong, I am on your side. I am not trying to say that you didn't check/test, or dismiss your results. The issue is real, but unless your setup is in a full debug mode, it is really hard to enable and read / relevant / understand the debug logs.

Again, you are the main person pushing this issue, so it was really not my intention to imply that I doubt you, or want to dismiss it. That is basically the main reason I asked for the ETA (they can get QA to immediately run regression test on different platforms, and get the engineer in charge to focus on the most promising one - basically few hours, and they should know the offending part, and be able to provide projected milestones).

One component that is actually not under our control is the UI part that potentially runs updated scripts/libs every other day or so (common practice with JS code, what Electron is using). Although, if that is the case, that potentially would influence other platforms as well - maybe not, as those libs may have localized issues. (Although, I didn't reboot my full node for several weeks, so no network residing scripts got updated for me.)

@Rigidity

This memory leak has happened to everyone I know who uses Ubuntu.

Well, how many people you know, since when they are affected, can they also chime in? The more info you can provide, the easier it is to narrow the scope.

@ALTracer runs Gentoo, so that means it is not strictly Debian/Ubuntu related.

@erickoh is still running v1.1.7, but based on what he wrote, he started having issues just a couple of weeks ago. That would imply that the issue is newer than his Chia version, or rather independent from Chia version. This is potentially the strongest statement pointing to some modified libs (again, either coming through some updates or Electron network scripts).

Although, the issue is already well stated, and the Chia eng. team will have a busy weekend. So, we should lay it off for a while.

emlowe · 2021-09-03T14:58:09Z

We are actively testing what we believe to be a fix

emlowe · 2021-09-03T16:49:22Z

Until we confirm a fix, this is preliminary information:

We believe this affects all versions on all platforms.
The problem started "recently" because it is related to how the node handles "compact vdfs" that are generated from Bluebox Timelords. We recently started to generate a large number of compact vdfs on mainnet by aggressively deploying Bluebox Timelords in AWS. These compact vdfs get gossiped around the network and nodes take these and replace their non-compacted versions with the new version. The sheer number of such messages was causing this issue.

We are currently duplicating this on testnet7 so we can verify

emlowe · 2021-09-03T23:20:16Z

We believe PR #8315 fixes this issue, for those that want (and understand how) to try the patch.
We continue to test on testnet7
I don't have an ETA when a build will be available (outside the PR builds)

erickoh · 2021-09-04T03:17:30Z

Thanks, I have not experienced this memory leak problem at all over the past 24 hours

emlowe · 2021-09-04T07:15:03Z

After running in testnet for about 10 hours now we are pretty confident PR #8315 fixes this issue. I don't think we will rush out a release this weekend though. Since we have stopped generating the majority of compact vdfs in mainnet, we believe this problem has been largely stopped in mainnet as well (it may depend somewhat on which peers you connect to and how many compact vdfs are getting passed around - but not many are being newly generated right now)

denisprovost · 2021-09-04T15:28:10Z

Thanks for your quick reaction. I have downgraded to 1.2.3 like many people, I will wait for a release like many people too. Please release when you are sure the problem is fixed.

avsync · 2021-09-04T15:49:06Z

Downgrading to 1.2.3 offers no benefit over 1.2.5. Read a few posts up, the issue was blueboxes flooding all versions of nodes. For me since the blueboxes were shut down 1.2.5 and 1.2.5 with PR 8315 both have similar normal memory usage but hard to say if the issue is resolved as the cause has been removed. Testnet findings seem to wrap it up though.

denisprovost · 2021-09-04T17:02:54Z

Yes but with 1.2.3, even if the problem is present (as I indicated above), my raspberry supports it better and does not crash after 6 hours of farming. The memory is very busy (65%) but remains stable at this level, which is not the case in 1.2.4 / 1.2.5 which ends at 100% and ends up crashing.

A restart of my system with raspberry takes 40 minutes off-farm (which I find huge, 'SSL context Connect call failed 127.0.0.1' for 15mns). I can't afford to reboot too much, I leave those that have a quick reboot time and recync allowing them to validate the patch.

I stay in 1.2.3 and wait for release when the problem will be really fixed, for the moment it is a 'test' patch which can only be used to confirm that this fixes the problem. If you have noticed that this fixes the problem for you, it is on the right track but for the moment, in my case, the safest is to wait for a real release which fixes the problem, which the majority of people do.

But once again, thank you for the work of the chia team and the speed of reaction!

denisprovost · 2021-09-05T18:31:24Z

This morning my harvester no longer wanted to stay connected same after restart, I rebooted the system and I took the opportunity for switch to 1.2.5 (again) + PR 8315.

I confirm that on my side this solves the problem of the major memory leak. 40% of memory used with smart variations since 8h

erickoh · 2021-09-06T04:21:16Z

Now my fullnode is stable, but my farmer process keeps crashing
Not sure if it is a related problem.
This is on 1.2.5 on ubuntu 20.04

dmesg | grep -i memory
[25056.304479] out_of_memory.part.0+0x1df/0x3d0
[25056.304481] out_of_memory+0x6d/0xd0
[25056.304644] Tasks state (memory values in pages):
[25056.304759] Out of memory: Killed process 1746 (chia_farmer) total-vm:1198528kB, anon-rss:922096kB, file-rss:2568kB, shmem-rss:0kB, UID:0 pgtables:2112kB oom_score_adj:0
[35438.326053] out_of_memory.part.0+0x1df/0x3d0
[35438.326054] out_of_memory+0x6d/0xd0
[35438.326162] Tasks state (memory values in pages):
[35438.326251] Out of memory: Killed process 8621 (chia_farmer) total-vm:1101484kB, anon-rss:852012kB, file-rss:2836kB, shmem-rss:0kB, UID:0 pgtables:1892kB oom_score_adj:0

denisprovost · 2021-09-06T13:04:25Z

It all depends on whether this full memory is linked to a gradual increase in ram. This happens after several hours or very quickly after a restart of the chia process ?

denisprovost added the bug Something isn't working label Sep 2, 2021

denisprovost changed the title ~~[BUG] Memory leak?~~ [BUG] Memory leak on 1.24/1.2.5? Sep 2, 2021

denisprovost changed the title ~~[BUG] Memory leak on 1.24/1.2.5?~~ [BUG] Memory leak on 1.2.4/1.2.5? Sep 2, 2021

denisprovost changed the title ~~[BUG] Memory leak on 1.2.4/1.2.5?~~ [BUG] Major memory leak on 1.2.4/1.2.5? Sep 2, 2021

emlowe assigned Yostra Sep 2, 2021

denisprovost changed the title ~~[BUG] Major memory leak on 1.2.4/1.2.5?~~ [BUG] Major memory leak on 1.2.4/1.2.5 Sep 3, 2021

emlowe mentioned this issue Sep 3, 2021

[BUG] Cannot connect to host 127.0.0.1:8444 ssl #7458

Closed

loppefaaret mentioned this issue Sep 4, 2021

"Last attempted proof" stopped randomly, So I have to restart the GUI when I found it's stopped, see debug.log below #8329

Closed

denisprovost closed this as completed Sep 6, 2021

denisprovost reopened this Sep 6, 2021

denisprovost closed this as completed Sep 6, 2021

emlowe mentioned this issue Sep 8, 2021

[BUG] Chia 1.2.7 freezes RPi4B 4GB during full node sync #8344

Closed

guydavis mentioned this issue Sep 9, 2021

Don´t have RC Hash issues guydavis/machinaris#273

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Major memory leak on 1.2.4/1.2.5 #8307

[BUG] Major memory leak on 1.2.4/1.2.5 #8307

denisprovost commented Sep 2, 2021

denisprovost commented Sep 2, 2021

ALTracer commented Sep 2, 2021

denisprovost commented Sep 2, 2021

avsync commented Sep 2, 2021

erickoh commented Sep 2, 2021 •

edited

Loading

Rigidity commented Sep 2, 2021

denisprovost commented Sep 2, 2021

emlowe commented Sep 2, 2021

denisprovost commented Sep 2, 2021 •

edited

Loading

Jacek-ghub commented Sep 3, 2021

Rigidity commented Sep 3, 2021

Jacek-ghub commented Sep 3, 2021 •

edited

Loading

Rigidity commented Sep 3, 2021

denisprovost commented Sep 3, 2021

Jacek-ghub commented Sep 3, 2021 •

edited

Loading

Jacek-ghub commented Sep 3, 2021

denisprovost commented Sep 3, 2021 •

edited

Loading

Rigidity commented Sep 3, 2021

djails commented Sep 3, 2021

Jacek-ghub commented Sep 3, 2021

emlowe commented Sep 3, 2021

emlowe commented Sep 3, 2021 •

edited

Loading

emlowe commented Sep 3, 2021

erickoh commented Sep 4, 2021

emlowe commented Sep 4, 2021

denisprovost commented Sep 4, 2021 •

edited

Loading

avsync commented Sep 4, 2021

denisprovost commented Sep 4, 2021 •

edited

Loading

denisprovost commented Sep 5, 2021 •

edited

Loading

erickoh commented Sep 6, 2021

denisprovost commented Sep 6, 2021 •

edited

Loading

[BUG] Major memory leak on 1.2.4/1.2.5 #8307

[BUG] Major memory leak on 1.2.4/1.2.5 #8307

Comments

denisprovost commented Sep 2, 2021

denisprovost commented Sep 2, 2021

ALTracer commented Sep 2, 2021

denisprovost commented Sep 2, 2021

avsync commented Sep 2, 2021

erickoh commented Sep 2, 2021 • edited Loading

Rigidity commented Sep 2, 2021

denisprovost commented Sep 2, 2021

emlowe commented Sep 2, 2021

denisprovost commented Sep 2, 2021 • edited Loading

Jacek-ghub commented Sep 3, 2021

Rigidity commented Sep 3, 2021

Jacek-ghub commented Sep 3, 2021 • edited Loading

Rigidity commented Sep 3, 2021

denisprovost commented Sep 3, 2021

Jacek-ghub commented Sep 3, 2021 • edited Loading

Jacek-ghub commented Sep 3, 2021

denisprovost commented Sep 3, 2021 • edited Loading

Rigidity commented Sep 3, 2021

djails commented Sep 3, 2021

Jacek-ghub commented Sep 3, 2021

emlowe commented Sep 3, 2021

emlowe commented Sep 3, 2021 • edited Loading

emlowe commented Sep 3, 2021

erickoh commented Sep 4, 2021

emlowe commented Sep 4, 2021

denisprovost commented Sep 4, 2021 • edited Loading

avsync commented Sep 4, 2021

denisprovost commented Sep 4, 2021 • edited Loading

denisprovost commented Sep 5, 2021 • edited Loading

erickoh commented Sep 6, 2021

denisprovost commented Sep 6, 2021 • edited Loading

erickoh commented Sep 2, 2021 •

edited

Loading

denisprovost commented Sep 2, 2021 •

edited

Loading

Jacek-ghub commented Sep 3, 2021 •

edited

Loading

Jacek-ghub commented Sep 3, 2021 •

edited

Loading

denisprovost commented Sep 3, 2021 •

edited

Loading

emlowe commented Sep 3, 2021 •

edited

Loading

denisprovost commented Sep 4, 2021 •

edited

Loading

denisprovost commented Sep 4, 2021 •

edited

Loading

denisprovost commented Sep 5, 2021 •

edited

Loading

denisprovost commented Sep 6, 2021 •

edited

Loading