Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gops hangs if there are too many Go processes to examine #123

Closed
siebenmann opened this issue Aug 31, 2020 · 17 comments · Fixed by #126
Closed

Gops hangs if there are too many Go processes to examine #123

siebenmann opened this issue Aug 31, 2020 · 17 comments · Fixed by #126
Assignees

Comments

@siebenmann
Copy link

If you have more than ten (I believe) Go processes, gops will hang in goprocess/gp.go's FindAll() function. This happens because the 'for .. range pss { ...}' loop attempts to obtain a limitCh token before starting the goroutine, but the goroutines only release their token after they have sent their reply to the found channel. Since the found channel is unbuffered and is only read from by the main code after the 'for .. range pss {}' loop finishes, this deadlocks if the number of goroutines would ever be limited.

I think either the entire 'for .. range pss { ... }' loop needs to be in a goroutine so that it doesn't block reading from the found channel, or the limitCh token must be released by the code before sending to found. The former change seems easier and I think it's correct.

@dbraley
Copy link

dbraley commented Sep 3, 2020

This is impacting me as well. Was able to get around it temporarily by upping the magic number to 100 without a problem, though that's obviously not the right fix.

@malisetti
Copy link

malisetti commented Sep 4, 2020

@siebenmann @dbraley could you please check if this PR fixes this issue.

@siebenmann
Copy link
Author

Purely on inspection I believe that this will fix the issue I'm experiencing, however I believe it will also reintroduce issue #118. Since limitCh is now sent to in a new goroutine that has no other purpose, there's nothing to stop a flood of goroutines calling isGo() and using too many file descriptors at once.

@malisetti
Copy link

malisetti commented Sep 4, 2020

@siebenmann thanks for the insight. I have modified the code to use wg instead of limitCh.

@yarcat
Copy link
Contributor

yarcat commented Sep 5, 2020

The change introduced above (see #125) moves the whole for-loop into a go routine (also suggested in the issue description). Ignoring the whitespace changes (git diff -b) is far the best way to understand the diff.

@josharian
Copy link

It seems to me the simplest fix would be to make found buffered, with size concurrencyProcesses.

@yarcat
Copy link
Contributor

yarcat commented Sep 6, 2020

I personally prefer to avoid buffered channels, but yeah -- it's possible as well.
UPD: See the comment below, where I explain why this actually isn't gonna work with concurrencyProcesses buffer size.

@yarcat
Copy link
Contributor

yarcat commented Sep 6, 2020

Actually, did not initially notice that concurrencyProcess buffer limit proposed by @josharian. Imho it isn't gonna help. Imagine that we have more than concurrencyProcess Go processes. After the first batch of concurrencyProcess workers we will fill the output buffer, and will get to exactly the same situation as we are in right now. The only correct way in this case is to create a buffer of future len(results), which is unknown, and the nearest known value would be len(pss).

@josharian
Copy link

josharian commented Sep 6, 2020

But by that point, receiving from found will have commenced. In any case, there's no need to speculate. That's what tests are for. :)

@yarcat
Copy link
Contributor

yarcat commented Sep 6, 2020

But reading from found happens after the main for-loop. It will never get there, because it will get stuck in the same way it gets stuck now.

@josharian
Copy link

I see now. Using len(pss) would work.

@yarcat
Copy link
Contributor

yarcat commented Sep 6, 2020

@josharian I've refactored my cl and made it testable.
UPD: Removed "Please try with len(pss) -- it will hang." Of course it will work. Sorry seems I have problems finishing reading sentences today (-;

@yarcat
Copy link
Contributor

yarcat commented Sep 6, 2020

@josharian Added a fix-123-buffered branch. Please take a look and feel free to try it out yourself.
UPD: And yes, surely it would work with len(pss) as we've agreed earlier.

@yarcat
Copy link
Contributor

yarcat commented Sep 6, 2020

Oups, sorry, forgot to push the changes. Now it's there.

@yarcat
Copy link
Contributor

yarcat commented Sep 6, 2020

Also added a version, which creates less go-routines, and doesn't use buffered channels. Please take a look https://github.com/yarcat/gops/blob/fix-123-no-buffers/goprocess/gp.go (the branch is still based on a PR #125)

@rakyll rakyll self-assigned this Sep 7, 2020
rakyll added a commit to rakyll/gops that referenced this issue Sep 7, 2020
Limit the number of concurrent workers by starting a fixed number of goroutines and providing work by an input channel.

Fixes google#123
@rakyll
Copy link
Member

rakyll commented Sep 7, 2020

Since this is impacting many users, instead of leaving comments on #124, I sent out #126.

rakyll added a commit that referenced this issue Sep 7, 2020
Limit the number of concurrent workers by starting a fixed number of goroutines and providing work by an input channel.

Fixes #123
yarcat added a commit to yarcat/gops that referenced this issue Sep 7, 2020
The tests verify google#123 fix and prevent future regressions.
tklauser pushed a commit that referenced this issue Sep 10, 2020
The test verifies that #123 is fixed and prevents future regressions.
@dbraley
Copy link

dbraley commented Sep 11, 2020

Sorry for the delay! Can confirm this fixed the problem for me as well, thanks @rakyll, @yarcat, @mseshachalam,
& @josharian !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants