-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guidance for clustered processes #82
Comments
I don't think it's up to prom-client to give any recommendations to this since it depends on how, where, type of application etc. I rather give no recommendation than giving a recommendation that only fits for certain conditions. However, if someone builds a library on top of prom-client that acts like a master process and collect metrics from other node processes, I'll happily link to it! |
@goofballLogic i use prometheus in the environment like you describe and have no problems with it. You have all the aggregation functions in prometheus for that. In your case just use sum for the "active requests". |
@disjunction, Using "sum" does not help resolve this problem because you are only ever receiving partial metrics. If you have e.g. 4 workers on a 4-core machine, and you Count requests on each worker, and you then expose a /metrics endpoint, each call to this endpoint will be serviced by one of those 4 workers and will only retrieve the metrics for that one worker (only ~25% of requests processed). How did you work around this problem? |
@siimon we're talking about the default way to run a node.js application in production here. Initially I followed the directions in the readme which work using a single process e.g. in development. Once we deployed to staging it became apparent that the default mechanism can't work for the deployment configuration recommended by node.js. Would you be opposed to me at least documenting the problem and outlining how to work around it using the Pushgateway? |
@goofballLogic sorry, i misunderstood with workers. As an idea - sum should work as long as you provide a unique additional label for each worker. But i agree it becomes an ugly workaround then. |
@goofballLogic im not sure about that - as I said it depends on certain things, not only how you built it, i.e how your infrastructure looks like etc. IF I was to recommend anything, that would be to build something on top of prom-client that aggregates from all child processes and not let all of them push metrics through the pushgateway. Using push like that is not really the prometheus way and it would feel bad to recommend that. |
Yeah we've discussed it amongst our architecture group internally and feel similarly that falling back on the gateway is sort of against the Prometheus philosophy.
We're looking at the possibility of clustering docker containers instead of the normal node.js clustering as a work around.
I've also raised the possibility of adding a dedicated worker for Prometheus metrics interacting with other processes via sockets. But that feels like it's working against the design of prom-client - to make everything easy for single processes.
Another option is for us to design a new client library that plays nice with node's process model.
Sent from TypeApp
…On 21 Mar 2017, 20:32, at 20:32, Simon Nyberg ***@***.***> wrote:
@goofballLogic im not sure about that - as I said it depends on certain
things, not only how you built it, i.e how your infrastructure looks
like etc.
IF I was to recommend anything, that would be to build something on top
of prom-client that aggregates from all child processes and not let all
of them push metrics through the pushgateway. Using push like that is
not really the prometheus way and it would feel bad to recommend that.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#82 (comment)
|
The way we have it set up is running on different machines (VMs in legacy and docker in the new K8S infra). Prometheus then known about every instance, and decorates every metric with a hostname or pod name, depending on the architecture. Might not be viable for you if you have to use node's own clustering, though. |
There is currently no guidance provided for clustered processes (default mechanism for running an app on multi-core servers). https://nodejs.org/api/cluster.html#cluster_cluster
By default, a clustered process, operating in round-robin fashion, will only serve metrics local to the process which handled a particular scrape request from Prometheus. This makes default metrics like "active requests" meaningless.
Possible solutions include:
The text was updated successfully, but these errors were encountered: