Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Expose client-go metrics #233

Merged

Conversation

DirectXMan12
Copy link
Contributor

This exposes the client-go metrics (client, workqueue, reflector)
metrics in the controller-runtime registry.

It also fixes a typo in the tests, and fixes our Gopkg.toml to prune some unnecessary requires (since I had to do a dep ensure anyway, and it was taking forever without the changes).

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 4, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: DirectXMan12

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 4, 2018
@DirectXMan12
Copy link
Contributor Author

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 4, 2018
@DirectXMan12 DirectXMan12 force-pushed the features/client-go-metrics branch from 250f34b to d06e3fb Compare December 4, 2018 01:59
@DirectXMan12
Copy link
Contributor Author

@brancz btw, this first is why I'm hesitant to call the current client-go setup easily usable.

Copy link

@brancz brancz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is quite a mouthful. Nonetheless the way you’re using the metrics registry here it could just as easily be injected and then this could just be a package in client-go.


// registerClientMetrics sets up the client latency metrics from client-go
func registerClientMetrics() {
var ()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a leftover

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is

Copy link
Contributor

@droot droot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me. I have couple of questions.


// this section contains adapters, implementations, and other sundry organic, artisinally
// hand-crafted syntax trees required to convince client-go that it actually wants to let
// someone use its metrics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)


// NB(directxman12): these are changed to MustRegister from Register. It's not clear why they weren't
// MustRegister in the first place, except maybe to not bring down the controller if the metrics fail
// to register (which shouldn't happen unless there's a duplicate metric).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


// registerWorkQueueMetrics sets up workqueue (other reconcile) metrics
func registerWorkqueueMetrics() {
workqueuemetrics.SetProvider(workqueueMetricsProvider{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this means metrics from multiple controllers will get combined ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, if you look at how the provider is implemented, there's a label for the controller name

@@ -0,0 +1,266 @@
/*
Copyright 2016 The Kubernetes Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/2016/2018 (someone should build github suggestion bot for this :))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

})
Registry.MustRegister(retries)
return retries
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These workqueue metrics have some overlap with the existing metrics. Shall we remove the higher level controller_runtime metrics ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's things that look like duplicates, but it's not clear to me if they actually are completely duplicate. Lets evaluate a bit.

Copy link
Contributor Author

@DirectXMan12 DirectXMan12 Dec 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to kill QueueLength in favor of depth, although the fact that the client-go metrics don't expose queue names as labels is strange to me. The other two I think we want to keep.

@brancz do you have any insight on the subsystem vs label decision there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on this subject: kubernetes/kubernetes#71165

Copy link
Contributor Author

@DirectXMan12 DirectXMan12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, will on the comments, will fix a bit later today


// registerClientMetrics sets up the client latency metrics from client-go
func registerClientMetrics() {
var ()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is

@@ -0,0 +1,266 @@
/*
Copyright 2016 The Kubernetes Authors.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack


// registerWorkQueueMetrics sets up workqueue (other reconcile) metrics
func registerWorkqueueMetrics() {
workqueuemetrics.SetProvider(workqueueMetricsProvider{})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, if you look at how the provider is implemented, there's a label for the controller name

})
Registry.MustRegister(retries)
return retries
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's things that look like duplicates, but it's not clear to me if they actually are completely duplicate. Lets evaluate a bit.

@DirectXMan12 DirectXMan12 force-pushed the features/client-go-metrics branch from 1dfbd22 to cd73474 Compare December 4, 2018 22:07
@DirectXMan12 DirectXMan12 changed the title ✨ Expose client-go metrics ⚠️ Expose client-go metrics Dec 5, 2018
@DirectXMan12 DirectXMan12 changed the title ⚠️ Expose client-go metrics ✨ Expose client-go metrics Dec 5, 2018
@DirectXMan12 DirectXMan12 force-pushed the features/client-go-metrics branch from cd73474 to 4c17676 Compare December 5, 2018 19:03
@DirectXMan12 DirectXMan12 changed the title ✨ Expose client-go metrics ✨ Expose client-go metrics Dec 6, 2018
@DirectXMan12 DirectXMan12 changed the title ✨ Expose client-go metrics ✨ Expose client-go metrics Dec 6, 2018
@DirectXMan12
Copy link
Contributor Author

DirectXMan12 commented Dec 7, 2018

I agree that this is quite a mouthful. Nonetheless the way you’re using the metrics registry here it could just as easily be injected and then this could just be a package in client-go.

Oh sure. I think this particular model is just super clunky ;-) Injected registry + refactor in client-go would definitely address this.

@brancz
Copy link

brancz commented Dec 8, 2018

Do you think you could take that on as a follow up, to contribute that to client-go? This is the second time I am reviewing a PR that implements more or less the same thing twice (first time was Prometheus implementing these metrics itself for its Kubernetes service discovery). I imagine many more want to do the same thing. Preferably even first in client-go and then move forward with this PR, but I understand if you first want to make progress with this.

@DirectXMan12
Copy link
Contributor Author

I can take a follow up to move some of this in client-go. I'd prefer to get this merged first, then I'll fix up client-go a bit, and then port it back over (it makes it easier since we might not immediately pull in the latest client-go release).

@DirectXMan12 DirectXMan12 force-pushed the features/client-go-metrics branch 2 times, most recently from 96b2df6 to a696eb8 Compare January 10, 2019 22:06
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jan 10, 2019
This exposes the client-go metrics (client, workqueue, reflector)
metrics in the controller-runtime registry.
This marks tests that have a "// TODO: write this" or similar as pending
tests, so that we can keep track of them.
The QueueLength metric is now a duplicate of the workqueue "depth"
metric, so it's no longer needed.
@DirectXMan12 DirectXMan12 force-pushed the features/client-go-metrics branch from a696eb8 to ed6a18c Compare January 10, 2019 22:34
Copy link
Member

@mengqiy mengqiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments.
Other LGTM.


longestRunning = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Subsystem: workQueueSubsystem,
Name: "longest_running_processor_microseconds",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

microseconds?
It seems this is not following the convention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but the current code requires it atm. it's fixed in future releases

return retries.WithLabelValues(name)
}

func (workqueueMetricsProvider) NewLongestRunningProcessorMicrosecondsMetric(name string) workqueuemetrics.SettableGaugeMetric {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Microseconds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current code requires it atm (see above)

@mengqiy
Copy link
Member

mengqiy commented Jan 11, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 11, 2019
@k8s-ci-robot k8s-ci-robot merged commit c043856 into kubernetes-sigs:master Jan 11, 2019
@DirectXMan12 DirectXMan12 deleted the features/client-go-metrics branch January 14, 2019 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants