Expose workqueue stats #75

rlguarino · 2018-08-20T15:49:03Z

This CL adds an http endpoint with one handler "/metrics" that serves
Prometheus metrics exported by MetaController. The metrics are collected
using OpenCensus and exported using the Prometheus HTTP exposition
format.

This CL also configures the Kubernetes workqueue package to collect its
statistics and expose them via the Prometheus http endpoint. We collect
and expose all of the workqueue metrics (Depth, Adds, Latency,
WorkDuration, Retries) tagged with the name of the queue.

rlguarino · 2018-08-20T18:13:25Z

main.go

+	mux := http.NewServeMux()
+	mux.Handle("/metrics", exporter)
+	srv := &http.Server{
+		Addr:    ":9999",


@enisoc I want to get your opinion on how we should configure this. My initial guess would be to just use an environment variable, but I wanted to discuss.

I've added a flag to configure this.

enisoc · 2018-09-06T18:27:58Z

main.go

 	"os"
 	"os/signal"
 	"sync"
 	"syscall"
 	"time"

+	"github.com/0xRLG/ocworkqueue"


Is there a benefit to going via OpenCensus instead of reusing the Prometheus provider from core k8s? Whenever possible, I prefer to reuse code and patterns from the core since they're well-exercised.

I didn't even know that existed I've been using this package for a long time now and I've only just made it open source. There isn't a huge difference between the two plain old prometheus and opencensus. Both result in Prometheus metrics being exposed on the debug endpoint. I think that OpenCensus ends up being cheaper to collect if you don't report it anywhere but we don't do that in this PR.

To be honest I think the fact that this project is in GoogleCloudPlatform and OpenCensus is being driven by Google people the two together made sense to me.

I really don't mind updating the PR to use the prometheus exporter from client-go.

Alright that makes sense. Let's just keep OpenCensus. This PR should be good to go then whenever you have a chance to make the other changes below.

enisoc · 2018-09-06T18:45:04Z

main.go

@@ -42,6 +48,7 @@ import (
 var (
 	discoveryInterval = flag.Duration("discovery-interval", 30*time.Second, "How often to refresh discovery cache to pick up newly-installed resources")
 	informerRelist    = flag.Duration("cache-flush-interval", 30*time.Minute, "How often to flush local caches and relist objects from the API server")
+	debugAddr         = flag.String("debug-addr", "localhost:9999", "The address to bind the debug http endpoints")


Binding to localhost by default means it won't be accessible from outside the Pod, doesn't it? Is that intentional? It looks like earlier you had just ":9999".

Yeah I think you're right I'll change it.

enisoc · 2018-09-06T18:52:57Z

main.go

+		Handler: mux,
+	}
+	go func() {
+		glog.Fatalf("cannot serve debug endpoint: %v", srv.ListenAndServe())


Maybe this should be logged but not fatal? The debug endpoint isn't strictly necessary for Metacontroller to perform its main duties. If I'm relying on Metacontroller to manage my app, I wouldn't want it to give up just because, for example, the metrics endpoint couldn't bind due to a port conflict.

In any case, we should probably ignore ErrServerClosed here because we call Shutdown() below.

You're right I remember fighting with this log line quite a bit I should have paid more attention. I'll change this to an Errorf

This CL adds an http endpoint with one handler "/metrics" that serves Prometheus metrics exported by MetaController. The metrics are collected using OpenCensus and exported using the Prometheus HTTP exposition format. This CL also configures the Kubernetes workqueue package to collect its statistics and expose them via the Prometheus http endpoint. We collect and expose all of the workqueue metrics (Depth, Adds, Latency, WorkDuration, Retries) tagged with the name of the queue.

rlguarino · 2018-09-17T17:17:22Z

I addressed the comments and rebased this off master. I think the GoPkg.lock should now be the result of master's GoPkg.locl with my changes to GoPkg.toml and then running dep ensure.

I ran:

$ git checkout origin/master Gopkg.lock
$ rm -rf vendor/
$ dep ensure

enisoc

LGTM. Thanks!

rlguarino commented Aug 20, 2018

View reviewed changes

rlguarino force-pushed the ross/2 branch from 56362fa to 6ebeb13 Compare August 23, 2018 16:01

enisoc reviewed Sep 6, 2018

View reviewed changes

rlguarino force-pushed the ross/2 branch 3 times, most recently from a87d998 to 527886a Compare September 17, 2018 17:10

rlguarino force-pushed the ross/2 branch from 527886a to 83207e6 Compare September 17, 2018 17:13

enisoc approved these changes Sep 17, 2018

View reviewed changes

enisoc merged commit 31394a5 into GoogleCloudPlatform:master Sep 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose workqueue stats #75

Expose workqueue stats #75

rlguarino commented Aug 20, 2018

rlguarino Aug 20, 2018

rlguarino Aug 23, 2018

enisoc Sep 6, 2018

rlguarino Sep 11, 2018 •

edited

Loading

enisoc Sep 13, 2018

enisoc Sep 6, 2018

rlguarino Sep 11, 2018

enisoc Sep 6, 2018

rlguarino Sep 11, 2018

rlguarino commented Sep 17, 2018

enisoc left a comment

Expose workqueue stats #75

Expose workqueue stats #75

Conversation

rlguarino commented Aug 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rlguarino Sep 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rlguarino commented Sep 17, 2018

enisoc left a comment

Choose a reason for hiding this comment

rlguarino Sep 11, 2018 •

edited

Loading