Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extra statsd metrics for normal and/or abnormal worker exits #2126

Closed
dnlserrano opened this issue Oct 7, 2019 · 6 comments
Closed

Add extra statsd metrics for normal and/or abnormal worker exits #2126

dnlserrano opened this issue Oct 7, 2019 · 6 comments

Comments

@dnlserrano
Copy link

dnlserrano commented Oct 7, 2019

Hey and thanks for all the hard work on gunicorn! ❤️

Would it be possible to have the metric for gunicorn.workers be brought back to INFO level instead of DEBUG? I'm very much interested in understanding the variations in number of workers my web server goes through while serving requests. Particularly, I'm interested in understanding when a given worker has been killed.

Furthermore, I think this is a good monitoring practice that shouldn't be kept around just during debugging, hence my suggestion of increasing the log level for it.

Is there a better way to achieve this?

If the proposed change is considered interesting, I may take a stab at it. It should be just a matter of changing info to debug. 🤔

Thanks in advance! 🙇

@tilgovi
Copy link
Collaborator

tilgovi commented Oct 11, 2019

My understanding of that change is that the metrics should be sent regardless of enabled log level.

Are you asking to log a message when the number of workers changes, or are you finding that StatsD metrics for gunicorn.workers are not sent?

@tilgovi
Copy link
Collaborator

tilgovi commented Oct 11, 2019

Particularly, I'm interested in understanding when a given worker has been killed.

There is an INFO level log when a worker exits.

@dnlserrano
Copy link
Author

My understanding of that change is that the metrics should be sent regardless of enabled log level.

Weird... you seem to be correct, but...

(...) or are you finding that StatsD metrics for gunicorn.workers are not sent?

Yes. 😐 I think I misinterpreted the code. It seems like you only log (and emit the metric) after you've spawned workers as needed, which is a bit puzzling to me (I haven't had time to look into commit history, which might explain this decision - sorry).

Would it make more sense to log the number of workers before spawning needed workers? In this way, if some worker segfaulted and was re-spawned by the main master loop, I could get metrics before the process of restoring them kicks in.

There is an INFO level log when a worker exits.

Yes, indeed. These are useful when sporadically looking at logs. But it'd be ideal if this was mirrored as a metric that I can track as part of my monitoring dashboard (e.g., in Datadog) without having to parse the logs using some external process (in the administrative sense of the word).

@tilgovi
Copy link
Collaborator

tilgovi commented Oct 14, 2019

It seems like you only log (and emit the metric) after you've spawned workers as needed, which is a bit puzzling to me (I haven't had time to look into commit history, which might explain this decision - sorry).

Ahh, yes. I don't recall the history, but it could be that the intention was to just log the number of configured workers. This number can change during a configuration reload or by using the TTIN and TTOU signals.

Would it make more sense to log the number of workers before spawning needed workers?

Maybe it would make sense to have extra guages for normal and/or abnormal worker exits.

@dnlserrano dnlserrano changed the title Bring back INFO level metric collection for changes in number of workers Add extra statsd metrics for normal and/or abnormal worker exits Oct 14, 2019
@dnlserrano
Copy link
Author

Maybe it would make sense to have extra guages for normal and/or abnormal worker exits.

Indeed. I've changed the issue title to reflect that, and I'll try to take a look at a possible PR for it soon. Thanks for the feedback @tilgovi! 🙇

@dnlserrano
Copy link
Author

In hindsight, I think what I need is #2407, so closing this in detriment of that PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants