Guidance for clustered processes #82

goofballLogic · 2017-03-20T17:39:47Z

There is currently no guidance provided for clustered processes (default mechanism for running an app on multi-core servers). https://nodejs.org/api/cluster.html#cluster_cluster

By default, a clustered process, operating in round-robin fashion, will only serve metrics local to the process which handled a particular scrape request from Prometheus. This makes default metrics like "active requests" meaningless.

Possible solutions include:

Recommend that users "push" metrics rather than the default pull mechanism. If this is the solution, the documentation should do a better job of showing how to set up a regular push of the default metrics via the push gateway
Provide a mechanism for collecting metrics from multiple child processes - possibly by feeding them back to the master process, or e.g. via socket to a dedicated process which can handle all requests for metrics.

siimon · 2017-03-21T08:44:04Z

I don't think it's up to prom-client to give any recommendations to this since it depends on how, where, type of application etc. I rather give no recommendation than giving a recommendation that only fits for certain conditions.

However, if someone builds a library on top of prom-client that acts like a master process and collect metrics from other node processes, I'll happily link to it!

disjunction · 2017-03-21T08:53:06Z

@goofballLogic i use prometheus in the environment like you describe and have no problems with it. You have all the aggregation functions in prometheus for that. In your case just use sum for the "active requests".

goofballLogic · 2017-03-21T11:53:32Z

@disjunction, Using "sum" does not help resolve this problem because you are only ever receiving partial metrics.

If you have e.g. 4 workers on a 4-core machine, and you Count requests on each worker, and you then expose a /metrics endpoint, each call to this endpoint will be serviced by one of those 4 workers and will only retrieve the metrics for that one worker (only ~25% of requests processed).

How did you work around this problem?

goofballLogic · 2017-03-21T12:00:49Z

@siimon we're talking about the default way to run a node.js application in production here. Initially I followed the directions in the readme which work using a single process e.g. in development. Once we deployed to staging it became apparent that the default mechanism can't work for the deployment configuration recommended by node.js.

Would you be opposed to me at least documenting the problem and outlining how to work around it using the Pushgateway?

disjunction · 2017-03-21T16:15:21Z

@goofballLogic sorry, i misunderstood with workers. As an idea - sum should work as long as you provide a unique additional label for each worker. But i agree it becomes an ugly workaround then.

siimon · 2017-03-21T20:32:55Z

@goofballLogic im not sure about that - as I said it depends on certain things, not only how you built it, i.e how your infrastructure looks like etc.

IF I was to recommend anything, that would be to build something on top of prom-client that aggregates from all child processes and not let all of them push metrics through the pushgateway. Using push like that is not really the prometheus way and it would feel bad to recommend that.

goofballLogic · 2017-03-21T21:56:15Z

Yeah we've discussed it amongst our architecture group internally and feel similarly that falling back on the gateway is sort of against the Prometheus philosophy. We're looking at the possibility of clustering docker containers instead of the normal node.js clustering as a work around. I've also raised the possibility of adding a dedicated worker for Prometheus metrics interacting with other processes via sockets. But that feels like it's working against the design of prom-client - to make everything easy for single processes. Another option is for us to design a new client library that plays nice with node's process model. ⁣Sent from TypeApp

…

On 21 Mar 2017, 20:32, at 20:32, Simon Nyberg ***@***.***> wrote: @goofballLogic im not sure about that - as I said it depends on certain things, not only how you built it, i.e how your infrastructure looks like etc. IF I was to recommend anything, that would be to build something on top of prom-client that aggregates from all child processes and not let all of them push metrics through the pushgateway. Using push like that is not really the prometheus way and it would feel bad to recommend that. -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #82 (comment)

SimenB · 2017-03-22T06:10:15Z

The way we have it set up is running on different machines (VMs in legacy and docker in the new K8S infra). Prometheus then known about every instance, and decorates every metric with a hostname or pod name, depending on the architecture. Might not be viable for you if you have to use node's own clustering, though.

siimon closed this as completed Mar 21, 2017

zbjornson mentioned this issue Jul 30, 2017

Cluster support #147

Merged

swimmadude66 mentioned this issue Aug 23, 2018

Missing support for node cluster jochen-schweizer/express-prom-bundle#16

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guidance for clustered processes #82

Guidance for clustered processes #82

goofballLogic commented Mar 20, 2017

siimon commented Mar 21, 2017

disjunction commented Mar 21, 2017

goofballLogic commented Mar 21, 2017 •

edited

Loading

goofballLogic commented Mar 21, 2017

disjunction commented Mar 21, 2017

siimon commented Mar 21, 2017

goofballLogic commented Mar 21, 2017 via email

SimenB commented Mar 22, 2017

Guidance for clustered processes #82

Guidance for clustered processes #82

Comments

goofballLogic commented Mar 20, 2017

siimon commented Mar 21, 2017

disjunction commented Mar 21, 2017

goofballLogic commented Mar 21, 2017 • edited Loading

goofballLogic commented Mar 21, 2017

disjunction commented Mar 21, 2017

siimon commented Mar 21, 2017

goofballLogic commented Mar 21, 2017 via email

SimenB commented Mar 22, 2017

goofballLogic commented Mar 21, 2017 •

edited

Loading