Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add socket backlog metric #2407

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Add socket backlog metric #2407

wants to merge 4 commits into from

Conversation

raags
Copy link

@raags raags commented Aug 20, 2020

If all the workers are busy or max connections are reached, new connections will queue in the socket backlog, which defaults to 2048. The gunicorn.backlog metric provides visibility into this queue and gives an idea on concurrency, and worker saturation. However, this is only available on Linux platforms.

This also adds a distinction between the timer and histogram statsd metric types, which although treated the same, can be the difference, for e.g. in this case histogram is not a timer: https://github.com/b/statsd_spec#timers

Also, another point to note is on Linux the backlog is also limited by net.core.somaxconn which is 128 by default. Not sure if that is the case on other platforms as well. Would it then make sense to reduce the default backlog from 2048?

Partially Fixes: #2057

gunicorn/sock.py Outdated Show resolved Hide resolved
@benoitc benoitc self-assigned this Jan 17, 2021
@hleb-albau
Copy link

desired feature

@vgrebenschikov
Copy link

I'll also vote for that feature

tilgovi
tilgovi previously approved these changes Feb 16, 2021
Copy link
Collaborator

@tilgovi tilgovi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change looks good. How do other feel about it?

@benoitc
Copy link
Owner

benoitc commented Feb 16, 2021 via email

@tilgovi
Copy link
Collaborator

tilgovi commented Feb 16, 2021

I don't think this can be done by recording accepting requests. The application cannot know how many requests are in the backlog without the OS telling it. This is happening in the arbiter, so once per second at the most, I think. I doubt that making a syscall to get a number from an OS struct is expensive, but it would be great if someone could chime in who knows a bit more about how this works than I do.

@benoitc
Copy link
Owner

benoitc commented Feb 16, 2021 via email

@tilgovi
Copy link
Collaborator

tilgovi commented Feb 16, 2021

Well 1 accepted request is done by 1 worker. You can sum it all.

That tells you how many requests are in progress, not how many are waiting in the OS backlog.

I think I have to remove my approval anyway, though. I don't know if this information is available. I can't find any documentation about tcp_unacked and why that gives any measure of the socket backlog.

@tilgovi tilgovi self-requested a review February 16, 2021 23:13
@tilgovi tilgovi dismissed their stale review February 16, 2021 23:14

need more info on tcp_unacked

@raags
Copy link
Author

raags commented Mar 14, 2021

I think I have to remove my approval anyway, though. I don't know if this information is available. I can't find any documentation about tcp_unacked and why that gives any measure of the socket backlog.

The documentation is definitely scarce on this. I can provide a few references:

  1. The uwsgi server implements an alarm mechanism based on the socket backlog and is calculated in the same way:

     uwsgi_sock->queue = (uint64_t) ti.tcpi_unacked;
    
  2. This is the relevant kernel code that also has comment alluding to its meaning.

  3. This article explains how backlog works in Linux well.

Also I the net.core.somaxconn value which defaults to 128, has been changed to 4096 in 5.4. By default, gunicorn sets the backlog to 2048, but this will get silently truncated to 128. I think the documentation should note this somewhere.

@tilgovi
Copy link
Collaborator

tilgovi commented Mar 20, 2021

This looks correct. Thank you for doing the research. I think I was confused because the name suggests un-ACK'd messages, but it seems the kernel is re-using this field to have a different meaning for listening sockets than for connected sockets.

I doubt the performance of a single system call per listener should worry us in the arbiter loop. That loop is not tight. It sleeps often.

@dnlserrano
Copy link

It'd be awesome to have this work out of the box and exposed as a metric in gunicorn. Any idea on when we can expect it to land in master/a release? Thanks in advance, and thanks for all the work on gunicorn! 🤗

@patrickmariglia
Copy link

Certainly a very desired feature.

Would love to know if this can be included in a release.

@benoitc
Copy link
Owner

benoitc commented Sep 25, 2021

@patrickmariglia why do you need it? Can you elaborate ?

I still don’t really see the point of such metric. it is not really operational since you can’t change the value without restarting and well you know it will fails due to a socket error. The question is what do you do with this alarm?

Keeping counters of accepted , in error and released requests may be more usefull to understand the pressure. Afterall the client should normally expect to fail and retry. Servers have limited resources that can’t be scaled indefinitely. We should add it imo.

I think it is acceptable as an option but this feature must be cross platform and standard. Why does it target specifically linux? How to make it cross platform?

@benoitc
Copy link
Owner

benoitc commented Sep 25, 2021

edited comment above.

@vgrebenschikov
Copy link

@patrickmariglia why do you need it? Can you elaborate ?

I still don’t really see the point of such metric. it is not really operational since you can’t change the value without restarting and well you know it will fails due to a socket error. The question is what do you do with this alarm?

Well, backlog metric - number of connections waiting for be answered, so, if that metric is above 0 - worth to check/fix number of workers/threads

What I would desire even more then backlog - number of active/spare threads - in fact it is connected, as far as number of threads will reach maximum (workers x threads) - backlogs starts to grow.

So, if you can monitor number of spare threads - you can forecast (more or less) when you will have no enough threads to process all parallel incoming connections.

@patrickmariglia
Copy link

patrickmariglia commented Sep 27, 2021

@patrickmariglia why do you need it? Can you elaborate ?

I still don’t really see the point of such metric. it is not really operational since you can’t change the value without restarting and well you know it will fails due to a socket error. The question is what do you do with this alarm?

Keeping counters of accepted , in error and released requests may be more usefull to understand the pressure. Afterall the client should normally expect to fail and retry. Servers have limited resources that can’t be scaled indefinitely. We should add it imo.

I think it is acceptable as an option but this feature must be cross platform and standard. Why does it target specifically linux? How to make it cross platform?

@vgrebenschikov explains very well what my use case is, I am trying to calculate or approximate worker saturation. Using socket backlog, or rather the number of waiting requests, can be a good proxy metric for worker saturation in situations where CPU is not a sufficient metric. So if this value even occasionally increases from 0 it could be an indicator that the number of workers may need to be increased or perhaps more replicas may need to be spun up (if you are in a k8s environment).

If there is another way to determine saturation, such as number of spare workers as @vgrebenschikov also points out, that would be equally as useful. I had thought this possible with the gunicorn.workers metric using a Datadog integration (datadog source reference). The metric is documented as being tagged by state: idle or working, however after talking to their support it seems that this will not work in a k8s environment.

@israelbgf
Copy link

I still don’t really see the point of such metric. it is not really operational since you can’t change the value without restarting and well you know it will fails due to a socket error. The question is what do you do with this alarm?

Scaling up pods in a k8s environment with this metric could be a usecase. The saturation means that there's not enough workers to handle the requests.

@matthew-walters
Copy link

I've found the gunicorn.backlog metric from this forked version of Gunicorn to be extremely useful when diagnosing issues running Gunicorn apps in Kubernetes. For example, it explained why some simple liveness and readiness probes were failing, the requests timed out while still in the Gunicorn backlog.

Also, it can explain discrepancies when our ingress in Kubernetes says a request to a Gunicorn app took 10 seconds, but Gunicorn says it took 1 second. In this example, the request spent 9 seconds in the Gunicorn backlog and only after that, when it was picked up by a Gunicorn worker, did Gunicorn start counting the request duration.

Would really appreciate this PR getting merged.

@flovilmart
Copy link

@benoitc , if I understand correctly, you would be willing to merge this feature if it was available on all platforms and not only linux or do you have other concerns or an alternate implementation you'd rather see merged?

This metric is useful for us in understanding stalls in request processing as @matthew-walters and @israelbgf point out.

@StasEvseev
Copy link

Very desired feature!

@tilgovi
Copy link
Collaborator

tilgovi commented May 8, 2022

I still think this makes sense. I could imagine someone using this to decide when to scale up an autoscaling deployment. It does give a sense of whether the workers are able to process requests as fast as they arrive or not. Its utility may vary by worker type, but that's okay.

I don't see the harm in the feature, unless we are worried about the extra overhead. If we are, we can make it optional, no?

@MatthieuToulemont
Copy link

I agree, this feature would be very useful !

@vishalkuo
Copy link

For folks that need this behavior, I think you can implement this locally by creating a when_ready hook that has the exact same behavior here: basically spawn a background monitor (similar to the pattern we see here) and periodically look at server.LISTENERS

@robotadam
Copy link

I was looking to expose saturation metrics for a service my team's responsible for, and this metric combined with a metric on the # of workers that are actively handling a request would be an ideal combination. With the latter we would have an effective metric of saturation — e.g. at a moment in time, 4/5 workers are busy then we are at 80% saturation. This PR would then give us the impact of saturation — that is if we hit 5/5 workers a small backlog size would not be an issue, but a large one would tell us how severe the issue is.

As of right now I can look at the difference in response times between the downstream load balancer and the gunicorn application server, but it's not ideal.

@vladyslav-bezpalko
Copy link

vladyslav-bezpalko commented Jul 22, 2022

There's a lack of metrics to scale gunicorn horizontally in k8s env now. This PR would definitely solve it!

@sitaram-manatal
Copy link

Hello good folks,
A thank you to all the people involved in committing and reviewing the code.
As this PR is approved, any foreseeable timeline this could be merged and released?
We are considering the backlog size as a potential metric for horizontal autoscaling and would love to give this a try

@wildcardops
Copy link

This would be a very helpful metric for a scaling issue that we are running into. As such, I'm also interested in the timeline on the release.

@beaugunderson
Copy link

data point regarding safety: we've been running this in production for the last 60 days across ~500 instances of gunicorn without any issues (master + this PR on top), and it has helped us to scale those instances a few times

@Sharathmk99
Copy link

I see PR is approved. Is it possible to include in next version. We are looking forward for this metrics. Thanks.

@yassinebelmamoun
Copy link

yassinebelmamoun commented Aug 7, 2024

Any updates or visibility regarding the merge/release timeline?

This will be very helpful for a lot of folks looking to auto scale.

Big thanks to all contributors 🙏

@beaugunderson
Copy link

Still running this in production without issues, would love if it was merged and released. 👍

Copy link
Owner

@benoitc benoitc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR and feedbacks.

Let say t I am still not convinced this metric is useful compared its impact on the arbiter. Backlog is set statically once, so it's easy to extrapolate about its usage using the average connections or can be checked by comparing the number of request landing in the proxy with the number of running or accepted requests in gunicorn. That would reduce the contention there. Especially since this requires to decode a value.

That said, I understand that people may want to get it. so I suggest the following changes (see comment inlines):

  1. only trigger this metric on system that support it
  2. possibly make this metric configurable.

This would let the system perform as usual when it can. Can you make these changes? As for 2 this will be easier soon with the new opentelemetry backend but can be passed using the setting module for now.

gunicorn/sock.py Outdated Show resolved Hide resolved
gunicorn/arbiter.py Outdated Show resolved Hide resolved
raags added 3 commits August 14, 2024 12:57
If all the workers are busy or max connections is reached, new
connections will queue in the socket backlog, which defaults to 2048.
The `gunicorn.backlog` metric provide visibility into this queue, and
give an idea on concurrency, and worker saturation.

This also adds a distinction between the `timer` and `histogram` statsd
metric types, which although treated the same, can be difference, for
e.g. in this case histogram is not a timer: https://github.com/b/statsd_spec#timers
Fix failing lint tests
@raags
Copy link
Author

raags commented Aug 14, 2024

@benoitc I've updated the PR with the changes requested, please review.

Copy link
Owner

@benoitc benoitc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest changes look good tor me, thank for them! Can you look at my comment, either way I think it's good for merging.

gunicorn/sock.py Outdated Show resolved Hide resolved
Copy link
Owner

@benoitc benoitc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Sorry , I missed the backlock check in my previous review. Can you fix it as well? Also Look at the failing CI tests

backlog = sum(sock.get_backlog() or 0
for sock in self.LISTENERS)

if backlog:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be always true there.even when backlog is set to -1.
I think correct test is :`if backlog >= 0'

Copy link
Author

@raags raags Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I missed this - pushing the fix

@raags raags force-pushed the master branch 3 times, most recently from 02389e4 to cf861a2 Compare August 14, 2024 13:32
@raags
Copy link
Author

raags commented Aug 14, 2024

Hi @benoitc the CI has passed

docs/source/settings.rst Outdated Show resolved Hide resolved
@Barsoomx
Copy link

Barsoomx commented Aug 28, 2024

Won't this break if the connection is already submitted to the ThreadPoolExecutor in a gthread worker?
they will be submitted up to worker_connections amount to the ThreadPoolExecutor. Is it possible to also pull this data?

if not, i think it's worth specifying it in the docs.

@raags
Copy link
Author

raags commented Aug 31, 2024

@Barsoomx I didn't understand, can you elaborate? the backlog is from the sockets and isn't dependent on the number of worker connections.

@Barsoomx
Copy link

@Barsoomx I didn't understand, can you elaborate? the backlog is from the sockets and isn't dependent on the number of worker connections.

I've tried to implement this locally with gthread worker class and it doesn't show the correct backlog count. I suspect the reason is keepalive + worker_connections logic (sockets are ACKed)

what happens in a sync worker: connections are in tcp backlog until worker can pick them up

what happens in a threaded worker: the connections are enqueued into a threadpool and are being keepalived until a thread can process them, up to worker_connections count.

In that case (the threaded worker) the metric has no value and the real "worker backlog" is the connections that exceed the thread count for each worker.

len(worker.futures) - cfg.threads (for threaded worker only)

@matthew-walters
Copy link

Come on folks, it's been over 4 years now.
Can we just limit scope of this to sync worker and get it merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Worker level metrics