-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add prometheus metrics to internal controller #132
Conversation
/assign @DirectXMan12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
So, my general comment would be:
Is there a good reason to not just have a global or semi-global controller-runtime registry that people can use? At the very least, I think it's probably better here to just plumb through a Prometheus registry that things can get a handle to/get injected, and then register metrics against that.
Is there a compelling use case for the extra abstraction layers on top of that?
@DirectXMan12 I'm not following your last comment, sorry. Thanks for getting back quick though! I've added a more in depth description to the top of this PR which describes the changes I've made, perhaps we could work from that to coordinate our understanding of the problem?
I think the registry within the Manager is semi-global is it not? Any Controller added to the Manager will have it's metrics registered to this instance of the Manager right? Do you mean to make it some exported global within some package so users would import a package and register to that?
I believe the PR already solves that, the Manager's registry has two points of access,
Which abstraction are you talking about, I'm lost at this point 😅 |
The extra abstraction that I'm talking about is the last two bullet points. I don't see much of a reason to not just have a global controller-runtime prometheus registry (or use the default, maybe?) and if people want to add new metrics, they can just import |
@DirectXMan12 Got ya! I added this extra abstraction after discussing with a colleague about globals vs non-globals. We just went for our preference which is to avoid globals as much as possible but I'm happy to modify the PR to make a global The reason I'm suggesting not to use the default prometheus registry is in case you have multiple packages all setting up metrics handling capabilities within one binary. For instance, we plan to run a webhook alongside one of our controllers which sets up it's own metrics endpoint using the global prometheus registry (part of the framework), I'm not personally a fan as at this point, both the webhook and the controllers would serve the same set of metrics if they both used the global registry. I don't know if that's a common problem, but to me, not using the default prometheus registry seems cleaner, happy to meet in the middle with a controller-runtime registry global if that's what you think is the best approach? |
Yeah, I agree with that part :-)
The generally accepted pattern for prometheus is using globals of some variety, AFAIK, so a controller-runtime-global registry might be the best approach. Then, if people want to serve it off the default endpoint, you should just be able to use |
@DirectXMan12 I've added a couple of commits; The first rewrites the metrics registry to be a global as we have previously discussed, The second adds some tests and modifies the internal metrics to make sure that you can actually register multiple controllers with one manager. The way I was doing it before you'd only ever be able to add one so I've added test cases for this and made sure it all works with multiple controller instances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ultimately, it's still unclear to me why we want all the extra structs/interfaces/per-controller-metrics-objects here. Can you elaborate a bit? It's not like the client-go ones, which get used to skip metrics collection or pulling in a Prometheus dependency...
pkg/internal/controller/metrics.go
Outdated
import "github.com/prometheus/client_golang/prometheus" | ||
|
||
// Metrics holds the prometheus metrics used internally by the controller | ||
type Metrics struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's unclear to me why we're not just defining these somewhere as is the normal Prometheus pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and then later on doing
ctrlmetrics.QueueLength.WithLabelValues(...).Observe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was considering the fact that each controller runs in a separate go-routine, perhaps it is better to have one instance of each metric per controller to reduce cross thread communication.
Happy to make the change as you've suggested if you're confident that writing to prometheus metrics is thread-safe? I wasn't sure whether they were or not so went for what I thought was the safer option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, Observe and friends are threadsafe:
All exported functions and methods are safe to be used concurrently unless specified otherwise.
https://godoc.org/github.com/prometheus/client_golang/prometheus
}) | ||
mux := http.NewServeMux() | ||
mux.Handle("/metrics", handler) | ||
server := http.Server{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might want to consider using the existing kubernetes machinery for this. Can be a follow-up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you like me to add a TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If user adds webhooks, then we have separate listener for webhook. I am assuming the default ports don't conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've double checked, the other listener defaults to 443
where this defaults to 8080
de1193b
to
73a8061
Compare
@DirectXMan12 I've changed it all to be globals and also rebased and squashed a bunch of the commits. Does this look more sensible now? One other thing I've changed is the metrics servings now has a |
@JoelSpeed coming late to the party. Will def. take a look at it tomorrow. |
Have a review in progress now that I'm back, but GitHub isn't letting me leave review comments (looks like a temporary blip in their system). I'll try again this afternoon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit inline, otherwise looks good.
As for disabling listening, not sure there's an easy way if we want to have a nice default, except maybe having a separate option.
pkg/manager/internal.go
Outdated
// Shutdown the server when stop is closed | ||
select { | ||
case <-stop: | ||
server.Shutdown(context.TODO()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context.Background()
, unless this is actually a TODO
@DirectXMan12 I fixed the As an FYI, I chose the Eg.
|
ack, that makes sense /approve fix the rebase issue, then I'll lgtm this |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: DirectXMan12, JoelSpeed The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
||
// ReconcileErrors is a prometheus counter metrics which holds the total | ||
// number of errors from the Reconciler | ||
ReconcileErrors = prometheus.NewCounterVec(prometheus.CounterOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the ReconcileErrors
I would suggest adding also ReconcileTotal
. That way we can see if in the past 5mins the rate of errors was too high.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small typo in the suggested new metric name, should probably be ReconcileTotal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@droot @DirectXMan12 Do you think this would be a worthwhile metric to integrate into this PR?
Gopkg.lock
Outdated
packages = [ | ||
"prometheus", | ||
"prometheus/promhttp", | ||
] | ||
pruneopts = "UT" | ||
revision = "c5b7fccd204277076155f10851dad72b76a49317" | ||
version = "v0.8.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some nice features in master that are not in this release, maybe now or in the future this could be changed to master instead, as this release is old.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend using the latest stable release not the master.
One more point: since this is a direct dependency, please add dep constraint in Gopkg.toml
so that dep gets this hint while resolving deps for a kubebuilder project.
@DirectXMan12 I've rebased and squashed the last couple of commits into earlier ones, should be ready to go now |
@JoelSpeed Any update on this? Seems like one of the tests timed out. |
@droot Apologies for the delay. I've resolved the conflict and pinned the prometheus dependency to |
@droot @DirectXMan12 I have been looking at some of our metrics today and realised that, at present, this implementation of the controller-runtime doesn't serve metrics for non-leader pods. Therefore you will see targets down in Prometheus, this is probably worth addressing at some point but wanted to ask whether you'd like me to fix that before merging this or follow up and add a TODO for it |
I am ok with fixing that as a followup PR. Please file an issue for that. |
/lgtm |
Add prometheus metrics to internal controller
Add support for creating core type controllers
Fixes #119
Adds prometheus metric server and some metrics to the controller runtime to allow users to see inside the controllers they are building.
Metrics Added:
Changes to operation:
:8080
).Start()
starts serving the prometheus registry from the manager.Add()
adds metrics from the runnable to the manager's regsitry.AddMetrics()
to add more metrics to the registry.add()
method they can register their metrics with the ManagerMetrics
structGetCollectors()
to allowManager.Add()
to register metrics from the controllerprocessNextWorkItem
updates the metrics within the Controller's Metrics struct