Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guidance for clustered processes #82

Closed
goofballLogic opened this issue Mar 20, 2017 · 8 comments
Closed

Guidance for clustered processes #82

goofballLogic opened this issue Mar 20, 2017 · 8 comments

Comments

@goofballLogic
Copy link
Contributor

There is currently no guidance provided for clustered processes (default mechanism for running an app on multi-core servers). https://nodejs.org/api/cluster.html#cluster_cluster

By default, a clustered process, operating in round-robin fashion, will only serve metrics local to the process which handled a particular scrape request from Prometheus. This makes default metrics like "active requests" meaningless.

Possible solutions include:

  1. Recommend that users "push" metrics rather than the default pull mechanism. If this is the solution, the documentation should do a better job of showing how to set up a regular push of the default metrics via the push gateway
  2. Provide a mechanism for collecting metrics from multiple child processes - possibly by feeding them back to the master process, or e.g. via socket to a dedicated process which can handle all requests for metrics.
@siimon
Copy link
Owner

siimon commented Mar 21, 2017

I don't think it's up to prom-client to give any recommendations to this since it depends on how, where, type of application etc. I rather give no recommendation than giving a recommendation that only fits for certain conditions.

However, if someone builds a library on top of prom-client that acts like a master process and collect metrics from other node processes, I'll happily link to it!

@siimon siimon closed this as completed Mar 21, 2017
@disjunction
Copy link

@goofballLogic i use prometheus in the environment like you describe and have no problems with it. You have all the aggregation functions in prometheus for that. In your case just use sum for the "active requests".

@goofballLogic
Copy link
Contributor Author

goofballLogic commented Mar 21, 2017

@disjunction, Using "sum" does not help resolve this problem because you are only ever receiving partial metrics.

If you have e.g. 4 workers on a 4-core machine, and you Count requests on each worker, and you then expose a /metrics endpoint, each call to this endpoint will be serviced by one of those 4 workers and will only retrieve the metrics for that one worker (only ~25% of requests processed).

How did you work around this problem?

@goofballLogic
Copy link
Contributor Author

@siimon we're talking about the default way to run a node.js application in production here. Initially I followed the directions in the readme which work using a single process e.g. in development. Once we deployed to staging it became apparent that the default mechanism can't work for the deployment configuration recommended by node.js.

Would you be opposed to me at least documenting the problem and outlining how to work around it using the Pushgateway?

@disjunction
Copy link

@goofballLogic sorry, i misunderstood with workers. As an idea - sum should work as long as you provide a unique additional label for each worker. But i agree it becomes an ugly workaround then.

@siimon
Copy link
Owner

siimon commented Mar 21, 2017

@goofballLogic im not sure about that - as I said it depends on certain things, not only how you built it, i.e how your infrastructure looks like etc.

IF I was to recommend anything, that would be to build something on top of prom-client that aggregates from all child processes and not let all of them push metrics through the pushgateway. Using push like that is not really the prometheus way and it would feel bad to recommend that.

@goofballLogic
Copy link
Contributor Author

goofballLogic commented Mar 21, 2017 via email

@SimenB
Copy link
Collaborator

SimenB commented Mar 22, 2017

The way we have it set up is running on different machines (VMs in legacy and docker in the new K8S infra). Prometheus then known about every instance, and decorates every metric with a hostname or pod name, depending on the architecture. Might not be viable for you if you have to use node's own clustering, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants