-
-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High resource usage over time (memory leak?) #3446
Comments
Had to restart it again. Could only last 3 days. |
s7evink/fetch-auth-events fixed it |
nvm lmao, it was fine for like 6 days and now its no again |
No idea if zRAM is a setup that should be supported but without a memory profile it will be difficult to tell what’s going on. |
it is zram, yeah edit : its only growing :') actually since its clearly majorly growing over a day, im down to set up reporting tomorrow day and report day after tmrw : ) ill do that |
You just need a single memory profile captured when the memory usage is high. That should contain enough info and the files are small. Having profiling enabled has next to no runtime cost so it’s fine to have it switched on for a long time. |
oh nice, okay then, ill enable the profiler tomorrow since for today i consider myself done and want the rest of the ill send out a memory profile capture in 1-3 days in this thread :) |
ok since it was just 1 environment variable i enabled it now, thought maybe its more complicated but nope, ill post the thing when the resource usage is high :) |
@neilalexander okay i got the heap report, idk how safe it is to send out heap reports, ive heard that its generally acceptable but for now ill stick to web ui screenshots it is at 2.5 gigs of ram usage atm according to systemd, or was at the time when i took the report, it dropped to 2.2 now ( ?? )
had to restart dendrite 2 days ago since it was refusing to write to postgresql database and both sending and receiving messages was slow :) if you need the actual report in full lmk phony.run is 64% somehow something about sql dbs, im using postgres if it matters graph : ive now disabled pprof and restarted dendrite, if you need more info i can provide it, but for now i think this will be okay |
iirc @jjj333-p has experienced a similar issue with dendrite, may have extra input ? |
Looks like it's struggling to process events. Can you try switching your instance to #3447 and see if it improves after a couple hours? |
the pr or the s7evink/fetch-auth-events branch ? im already on the branch if thats what youre asking, i can try the pr i guess edit : oh the pr is literally a merge request to merge the branch into main edit 1 : and yes the branch drastically improves the performance, not the resource bug though, it mightve helped it partially at least though since it is not as brutal anymore |
i face this same issue, i can say that the fetch-auth-events pr did help a lot but its still not fixed. i also notice that after about a week of uptime it will be near oom on my system (full 4gb of ram + zram), and then i reboot and both the cpu and ram usage are way down (10-15% avg cpu, only 1-2gb of ram used). i also have all the caching i can disabled in the config. idk how helpful to this i can be though sorry |
@TruncatedDinoSour Can you attach the full profile? |
sure |
ftr if it wasnt clear this is gzipped xD as in
i had to gzip it cuz github cried about the raw one |
istg i cant even run it for 3 days without it crying in agony ugh |
I saw it was gzipped yeah, it looks a lot like the profile is just showing the server is crunching some extremely complex room state. Are you a member of any very large or complex rooms? |
no and i dont think anyone on the hs really is |
Any interesting log entries happening at the time? Particularly |
not that i see from recent logs, some media stuff but other than that nothing worth of concern i believe |
turns out 1 user was a part of the matrix.org HQ maybe thats why |
Try using the admin API |
dont worry i am using both evac and purge right now its taking some time but its working regardless, the resource usage only has been a problem recently when that account has been there since the start of the homeserver so idk maybe itll help ? idk, well see i guess :D |
If you "need" to host huge rooms on a system without much RAM you might benefit from disabling the in-memory cache:
The cost is somewhat more work for postgres. |
i did a while ago alrd |
Background information
go version
: go version go1.23.0 linux/amd64Description
Steps to reproduce
This is its resource usage only 2 days later after its most recent restart. And it only creeps over time until I restart it:
It's weird.
The text was updated successfully, but these errors were encountered: