-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to purge largest worker #3251
Comments
Objections:
Pro:
I do not see a clean way to accomplish that with existing hooks, but both this and the existing Linking related issue for easier discovery: #2422 |
The documentation and this issue description have no mention of memory leaks and memory leaks are not what is being addressed by this feature. Granted that the "the size of the oversized workers would also creep up" (only from the issue description) would fit a memory leak, but it was actually talking about something quite different. I will look at the documentation again to try to make it less likely for people to think that memory leaks are being discussed. And these causes of process memory growth are indeed quite common in a huge variety of systems I've work on over the decades. Although here there is work done to pre-load lots of the components before fork()ing, a system as deep as Django has a huge body of potential code that might be loaded and not all of it will get fully pre-loaded (and there are good reasons for that). And even if you managed to do that, when the need for a particularly large set of data pops up, memory usage will increase and (on many systems) it is unlikely to go all the way back down even if free() is called for most of that memory. Memory arenas get fragmented (and a lot of systems reasonably don't add the layer of indirection and locks to allow aggressively trying to defragment arenas and return pages to the OS). And there can be good reasons to not even free() every buffer when it is likely just going to be needed again. And there will be tons of places where little caches get populated to make the next iteration faster and those are not going to be free()d even if that particular path doesn't get exercised again over the next hour. So, if you have a system that has a few huge customers and a ton of not-huge customers (and similar patterns for a lot of other types of objects), and you take the time to pay close attention to how memory is being used, in my experience, it is indeed pretty common to see the rare expensive request cause one worker to grow in memory consumption and stay there. Though, in my experience, it isn't so common for people to notice. So I was not particularly surprised to see that pattern in the Python system I was brought in to work on. And I would think we would want gunicorn to be an appropriate tool even for services that deal with a wide variety of types of requests and thus have code paths that only get executed infrequently. And I have a live example of a system where memory usage is decreased and made more flat by restarting at most 1 worker every 5 minutes (per pod). That's cheap and the impact on scalability in our case was huge. The right choice of timing and floor is going to be different for other systems. If you are running gunicorn in VMs instead of in pods, this is unlikely to be useful to you. And I'm sure many systems aren't going to have enough variability in request complexity/size and optional code paths to make something like this of much value. But I also think this much exposition is not appropriate in the documentation for a relatively simple group of options.
You can't accurately use the word "waste" unless the mildly expensive thing has no value. The periodic behavior is fully optional and so has no appreciable cost if you don't go to the work to enable it (I doubt you could even measure the impact in aggregate of the added checking if a variable is zero). And you would only enable it if it had value in your case.
Those features are documented other places. And we already use threads and keep-alives and timeouts. If you have more specific recommendations of particularly promising alternatives for this situation that should be called out (now that you understand that we aren't talking about memory leaks), please elaborate.
And I covered why it is not useful here. Perhaps we should update the CONTRIBUTING documentation to add to "We are always thrilled to receive pull requests" to mention that we also love to tell you that your code is probably garbage? :) I can see how max_requests could be useful for limping along until you get a memory leak fixed. We use max_requests but for a different reason. Leaving a complex system running for a very long time increases the risk of running into one of those complex bugs that haven't been fixed because they are nearly impossible to trigger but the millions of monkeys with typewriters producing requests in Production can eventually manage it. So I've several times seen services get into states where stuff is just weird and stays that way until the misbehaving component is restarted. And I've actually done the work to diagnose several of those and the problems always ended up not being a bug in a application code but broken behavior in a base component (like Python itself). So triggering some kind of eventual restart is a good idea for a resilient system, even though the potential problem is pretty rare (though perhaps not as rare as many people think).
One idea would be to have the WorkerTmp.notify() write self.nr to the temporary file (which would also update the file timestamp) and make that number available to the configured prune function. Then provide an example config with a function that returns that number and that sets the prune floor to what max_requests used to be set to. Then simultaneous restarts would no longer be a risk. That feels to me like it would handle most of the use cases of max_requests. For a transition period, max_requests could be deprecated but implemented using the new mechanics (max_requests would not be allowed if you configured your own prune function). And people can combine max_requests-type restarts with "largest worker" restarts using some simple math to produce a combined score. For example, adding the prune floor to the process size only if the number of requests is larger than your desired max requests would prune the largest workers and those exceeding max requests, just one at a time. And having the approximate number of requests handled by each worker available to the Arbiter could also have interesting other applications. But I've gone on long enough. |
Thanks for the detailed explanation, much appreciated.
This is the sort of explanation I was missing from the docstring. Sacrificing cache-powered performance in one application to the benefit of another is a perfectly valid reason, one that does not need to involve any sort of generally undesirable memory consumption.
The interaction between the various ways to trigger a worker restart and long-running requests is badly documented across the board, so I am requesting that newly introduced behavior tries really hard to set the expectations right. Such as promising that long-running requests will be interrupted after waiting for
That is the case I am most afraid of, actually. Business incentives already do not align well with getting bugs in Python (or Gunicorn!) thoroughly diagnosed and resolved. It would be wise if both the existing feature, and your newly suggested extensions, limit their amplifying effect on that situation with very carefully worded usage instructions.
By what metric? Are you using some simple tool/command, simple enough so other users could run it on their systems to determine to what degree they are affected / to what degree their applications memory overhead might improve by simply applying your PR right away?
In that sense people have been limited to one predefined metric (# of requests), and you are offering a versatile way to add more. Other than the existing jitter mechanism not netting you sufficiently predictable behavior w.r.t simultaneous restarts.. do you see a blocker on why your work cannot be introduced as a strict superset of the existing mechanism? That would mostly nullify my "adding another" objection. |
In getting Gunicorn running in Kubernetes and tuning resource specifications, I found that the memory consumption of each pod would just grow over time.
In the pull request I'll attach to this issue, I implemented 3 new config options that allowed me to make the memory consumption quickly flatten out and stay stable (and lower) over long periods with Production load. Example configuration:
The crux of the problem is that relatively rare requests are much larger (or generate much larger responses or load a lot more resources when computing the response) than typical requests and cause Python to allocate quite a bit more memory and hold on to that memory until the process is killed. Over time, the number of oversized workers would grow and the size of the oversized workers would also creep up, causing pod memory consumption to just keep growing.
This is somewhat counteracted by max_requests, but not very effectively. Setting max_requests low enough to make a dent in this growth can make process restarts so frequent during busy times that you are likely to get overlapping restarts reducing your capacity (and spending more CPU doing all that restarting). And then it is still not effective when traffic volume drops during slow times of day.
Having each worker kill itself when it gets too big risks unbounded and uncoordinated restarting which could be a disaster.
So I added a time-based controlled restart of just the worker that is currently the largest, using whatever measure you prefer or that best fits how your code behaves.
The time-based approach works well in Kubernetes because when traffic ramps up, the number of pods increases and the rate of workers being restarted also goes up proportionally. And yet you never have two workers in the same pod restarting even close to the same time, so no significant dips in capacity.
And the prune_floor allows you to stop doing any restarts during periods when they aren't really needed.
The text was updated successfully, but these errors were encountered: