-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iowait and memory consumption are skyrocketing regularly #9271
Comments
After ~45min the GetDiffRange operation vanished from the Monitoring web UI, but both memory consumption and IOwait remain equally high. I see myself forced to kill gitea after additional 15 minutes. |
Killing the gitea service immediately freed up 4.5GiB of memory. |
There is a bug about |
will do |
I think it's unlikely to be fixed by that - the architecture of GetDiffRange keeps the whole diff in memory. (Probably multiple copies of it too.) If you have a huge diff you're going to break Gitea. |
We can also cache the diff sections to cache server or files. |
The issue is still happening regularly even with v1.10.1, but I haven't always seen the "GetDiffRange" operation showing up on the Monitoring page. So GetDiffRange isn't necessarily the cause of this issue. The gitea logs only show a huge amount of mysql queries and I wasn't able to get anything from the PROF data (I'm not a go developer). |
So that maybe some cron tasks. |
Hmm excessive Perm checks are a possible cause - see the comments on #8540? There's another possible issue - testPatch was previously reading the whole patch in to memory. That should stop now. |
So that maybe some cron tasks.
Judging from the (non?)periodicty-time the issue appears, is at least **none** of the tasks with a periodicty of 10min or 24h listed in the monitoring tab.
Hmm excessive Perm checks are a possible cause - see the comments on
#8540?
I don't have any repos with an excessively large number of owners, which seems to be the precondition for this issue.
There's another possible issue - testPatch was previously reading the
whole patch in to memory. That should stop now.
Under which circumstances does `testPatch` get triggered?
|
Changes to PRs would cause that |
Do you have repo indexer on? Maybe that could be a cause? |
I didn't modify any PRs in the meantime and do not have the indexer enabled (if it's disabled by default). git-lfs is enabled though. Server Uptime (taken from the admin dashboard) |
I think in fact
is the memory what go used. |
@lunny No, according to htop
|
If the PROF files may be useful, I can provide you with those. I was too dumb to analyse them myself. |
Any ideas on what I can do to figure out what the memory is even used for? |
OK so on master I recently made a change that means that PR patches are not stored in memory - that could have been to blame. The other places that are currently poorly written are diff and blame as these both read the whole thing into memory. I've never used the prof output so I'd be starting from scratch with that. |
I really doubt that, as my repos have nearly no PRs. Once I find the time I'll try disabling individual repos, maybe I can pin down the issue to a single repo. |
I'd like to disable individual repos to see whether a single certain repo causes this issue. |
If you're using Linux, maybe you can try the following command to check what processes are using what repos at a given time:
The above command requires using root credentials. With Windows you'd need sysinternals' process monitor (MS official). |
I deleted all repositories but one, now the memory issue vanished. Unfortunately I am inly able to do this on my production instance. So if there is a more convenient way of disabling and restoring repos than |
This issue has been automatically marked as stale because it has not had recent activity. I am here to help clear issues left open even if solved or waiting for more insight. This issue will be closed if no further activity occurs during the next 2 weeks. If the issue is still valid just add a comment to keep it alive. Thank you for your contributions. |
This is still an issue, but unfortunately I haven't been able to pin it down further. The current memory stats of my gitea 1.11.5 instance: According to htop, the From the admin dashboard: Server Uptime |
This issue has been automatically marked as stale because it has not had recent activity. I am here to help clear issues left open even if solved or waiting for more insight. This issue will be closed if no further activity occurs during the next 2 weeks. If the issue is still valid just add a comment to keep it alive. Thank you for your contributions. |
I think I'm experiencing the same issue. I'm using the Docker image with version I see that one of my repositories has some huge diffs, so I'll try remove the repository from Gitea and report back if that changes anything. Down below you see how I get weird spikes in memory usage, caused by Gitea. When the Docker container is stopped, memory recovers to somewhere between 30% and 35%. Also note that on this instance, LFS is disabled. Edit: After about two days the issue has not come back. My conclusion is that repositories with huge diffs are a no-no on Gitea for now. |
@eikendev What do you categorise as huge diffs and how can I check whether one of my repos falls into that category? Gitea is still acting up, eating all my memory and swap: $ ps auxm |grep gitea
gitea 8630 1.4 82.6 9303548 3336128 ? - Jul25 52:16 /nix/store/404wfnlg9dvlzphd955zlqfclsaa31aj-gitea-1.11.8-bin/bin/gitea web
gitea - 0.1 - - - - Ssl Jul25 5:26 -
gitea - 0.2 - - - - Ssl Jul25 8:56 -
gitea - 0.0 - - - - Ssl Jul25 0:00 -
gitea - 0.1 - - - - Ssl Jul25 6:26 -
gitea - 0.0 - - - - Ssl Jul25 2:42 -
gitea - 0.0 - - - - Ssl Jul25 0:00 -
gitea - 0.0 - - - - Ssl Jul25 0:00 -
gitea - 0.2 - - - - Ssl Jul25 7:14 -
gitea - 0.0 - - - - Ssl Jul25 3:24 -
gitea - 0.1 - - - - Dsl Jul25 5:38 -
gitea - 0.2 - - - - Ssl Jul25 9:07 -
gitea - 0.0 - - - - Ssl Jul25 0:20 -
gitea - 0.1 - - - - Ssl Jul26 2:19 -
gitea - 0.6 - - - - Dsl 00:35 0:39 -
gitea 30107 0.2 0.0 0 0 ? - 00:01 0:20 [git] <defunct>
gitea - 0.2 - - - - Z 00:01 0:20 -
gitea 30119 0.0 0.0 0 0 ? - 00:03 0:06 [git] <defunct>
gitea - 0.0 - - - - Z 00:03 0:06 - It also leaves broken git processes behind. Metrics from the admin dashboard, the memory + swap usage seems to be pretty equal with Memory Obtained:
|
For comparison, after a restart, Memory Obtained is back at 137MiB: $ ps auxm |grep gitea
gitea 1063 2.2 4.8 1550104 194100 ? - 02:27 0:04 /nix/store/404wfnlg9dvlzphd955zlqfclsaa31aj-gitea-1.11.8-bin/bin/gitea web
gitea - 0.4 - - - - Ssl 02:27 0:00 -
gitea - 0.1 - - - - Ssl 02:27 0:00 -
gitea - 0.0 - - - - Ssl 02:27 0:00 -
gitea - 0.2 - - - - Ssl 02:27 0:00 -
gitea - 0.0 - - - - Ssl 02:27 0:00 -
gitea - 0.0 - - - - Ssl 02:27 0:00 -
gitea - 0.3 - - - - Ssl 02:27 0:00 -
gitea - 0.1 - - - - Ssl 02:27 0:00 -
gitea - 0.3 - - - - Ssl 02:27 0:00 -
gitea - 0.5 - - - - Ssl 02:28 0:01 -
gitea-1.11.8 |
Hmm... I've got a few ideas here:
When we run git gc git will attempt to repack things. This process can take a lot of memory and would cause a regular spike. It's potentially worth running git gc on the command line for the in turn repositories. It may be that we have incorrect options or you need to set the pack configuration options in your global git configuration. If these git gc processes die then they can go defunct - but you should have some information on the processes page when this is happening.
|
I've already tried to bisect the repos to find the responsible one, but the process for doing so is quite bothersome: Is there any better way? |
It was a repository where a file of over 100MB was added, something that actually belongs into lfs. I guess my specific case can be detected by looking for large files inside the repository. About the general case, maybe |
@eikendev that's bad - do you know if it was when the file was pushed or when someone looked at the diff page? |
oo interesting I just pushed a file with no newlines and memory shot up. |
In the graph I posted earlier I have neither pushed anything nor loaded any huge diff. I looked at the repo at some points, but not the diffs. When memory is freed in the graph, this would indicate I manually restarted the server. To me this seems more like a product of a scheduled task. I'm not sure right now if my issue and @schmittlauch's are related. |
Do you see what is exact git command (with arguments) that is left dangling? |
@lafriks Currently there is no dangling git command after having restarted gitea. |
OK I think this is going to be related to go-git related issues. I think therefore I'm going to pull the confirmed from this tag to see if 1.14 still has this issue. |
I think I might have discovered the underlying issue here - if git cat-file is called on a broken git repository it will hang until stdin is closed instead of fatalling immediately. I think therefore #17991 will fix this. |
This issue might even be resolved already, but I cannot say for sure as I moved my installation to a new installation, cleaned up some old repos, and thus might have removed such a broken repo coincidentally. So I suggest to indeed close this once #17991 gets merged, unless some new instance of these symptoms here comes up. |
…and other fixes (#17991) This PR contains multiple fixes. The most important of which is: * Prevent hang in git cat-file if the repository is not a valid repository Unfortunately it appears that if git cat-file is run in an invalid repository it will hang until stdin is closed. This will result in deadlocked /pulls pages and dangling git cat-file calls if a broken repository is tried to be reviewed or pulls exists for a broken repository. Fix #14734 Fix #9271 Fix #16113 Otherwise there are a few small other fixes included which this PR was initially intending to fix: * Fix panic on partial compares due to missing PullRequestWorkInProgressPrefixes * Fix links on pulls pages due to regression from #17551 - by making most /issues routes match /pulls too - Fix #17983 * Fix links on feeds pages due to another regression from #17551 but also fix issue with syncing tags - Fix #17943 * Add missing locale entries for oauth group claims * Prevent NPEs if ColorFormat is called on nil users, repos or teams.
…and other fixes (go-gitea#17991) This PR contains multiple fixes. The most important of which is: * Prevent hang in git cat-file if the repository is not a valid repository Unfortunately it appears that if git cat-file is run in an invalid repository it will hang until stdin is closed. This will result in deadlocked /pulls pages and dangling git cat-file calls if a broken repository is tried to be reviewed or pulls exists for a broken repository. Fix go-gitea#14734 Fix go-gitea#9271 Fix go-gitea#16113 Otherwise there are a few small other fixes included which this PR was initially intending to fix: * Fix panic on partial compares due to missing PullRequestWorkInProgressPrefixes * Fix links on pulls pages due to regression from go-gitea#17551 - by making most /issues routes match /pulls too - Fix go-gitea#17983 * Fix links on feeds pages due to another regression from go-gitea#17551 but also fix issue with syncing tags - Fix go-gitea#17943 * Add missing locale entries for oauth group claims * Prevent NPEs if ColorFormat is called on nil users, repos or teams.
[x]
):Description
As written earlier in the Discourse forum, it regularly happens that gitea makes my server's iowait and memory consumption skyrocket, causing issues for all other services running within the same VM. Usually only killing the whole gitea service makes the server recover.
While I couldn't find any direct cause by parsing the logs or the PPROF data, I just discovered in the admin web interface (tab Monitoring) that the GetDiffRange operation seems to be responsible. It seems to get stuck at a single repo of just 138MiB size, clogging IO and making gitea consume more and more memory.
iotop
shows that gitea creates a lot of I/O, although I'm not sure whether swapping operations are allocated to the process causing them as well.After this issue first occured, I limited the gitea systemd service to 600MiB and gitea is currently using all of it. But apparently this limit isn't working as the memory+swap usage increased by more than 3GiB during the GetDiffRange operation.
What does this GetDiffRange operation do? I wasn't able to find information on that.
Luckily I was able to allocate an additional 1GiB of memory to my VM, giving me the chance to let the operation run. So far it's running for more than 30 minutes though.
Any ideas about that operation, why it's getting stuck or any additional debug information I could provide?
The text was updated successfully, but these errors were encountered: