Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HTTP metrics for in-flight requests #5440

Merged
merged 9 commits into from
Jul 1, 2022
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,10 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#5409](https://github.com/thanos-io/thanos/pull/5409) S3: Add option to force DNS style lookup.
- [#5352](https://github.com/thanos-io/thanos/pull/5352) Cache: Add cache metrics to groupcache.
- [#5391](https://github.com/thanos-io/thanos/pull/5391) Receive: Add relabeling support.
- [#5408](https://github.com/thanos-io/thanos/pull/5391) Receive: Add support for consistent hashrings.
- [#5408](https://github.com/thanos-io/thanos/pull/5408) Receive: Add support for consistent hashrings.
- [#5391](https://github.com/thanos-io/thanos/pull/5391) Receive: Implement api/v1/status/tsdb.
- [#5424](https://github.com/thanos-io/thanos/pull/5424) Receive: export metrics regarding size of remote write requests
- [#5440](https://github.com/thanos-io/thanos/pull/5440) HTTP metrics: export number of in-flight HTTP requests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit but I think the pattern seems to be to end the sentence with . as in the other changelog entries?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this and other changelog entries lacking the period in 7db4419.


### Changed

Expand Down
57 changes: 30 additions & 27 deletions pkg/extprom/http/instrument_server.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,34 +61,37 @@ func (ins *defaultInstrumentationMiddleware) NewHandler(handlerName string, hand
func httpInstrumentationHandler(baseLabels prometheus.Labels, metrics *defaultMetrics, next http.Handler) http.HandlerFunc {
return promhttp.InstrumentHandlerRequestSize(
metrics.requestSize.MustCurryWith(baseLabels),
promhttp.InstrumentHandlerCounter(
metrics.requestsTotal.MustCurryWith(baseLabels),
promhttp.InstrumentHandlerResponseSize(
metrics.responseSize.MustCurryWith(baseLabels),
http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
now := time.Now()

wd := &responseWriterDelegator{w: w}
next.ServeHTTP(wd, r)

requestLabels := prometheus.Labels{"code": wd.Status(), "method": strings.ToLower(r.Method)}
observer := metrics.requestDuration.MustCurryWith(baseLabels).With(requestLabels)
observer.Observe(time.Since(now).Seconds())

// If we find a tracingID we'll expose it as Exemplar.
span := opentracing.SpanFromContext(r.Context())
if span != nil {
spanCtx, ok := span.Context().(jaeger.SpanContext)
if ok && spanCtx.IsSampled() {
observer.(prometheus.ExemplarObserver).ObserveWithExemplar(
time.Since(now).Seconds(),
prometheus.Labels{
"traceID": spanCtx.TraceID().String(),
},
)
promhttp.InstrumentHandlerInFlight(
metrics.inflightHTTPRequests.With(baseLabels),
promhttp.InstrumentHandlerCounter(
metrics.requestsTotal.MustCurryWith(baseLabels),
promhttp.InstrumentHandlerResponseSize(
metrics.responseSize.MustCurryWith(baseLabels),
http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
now := time.Now()

wd := &responseWriterDelegator{w: w}
next.ServeHTTP(wd, r)

requestLabels := prometheus.Labels{"code": wd.Status(), "method": strings.ToLower(r.Method)}
observer := metrics.requestDuration.MustCurryWith(baseLabels).With(requestLabels)
observer.Observe(time.Since(now).Seconds())

// If we find a tracingID we'll expose it as Exemplar.
span := opentracing.SpanFromContext(r.Context())
if span != nil {
spanCtx, ok := span.Context().(jaeger.SpanContext)
if ok && spanCtx.IsSampled() {
observer.(prometheus.ExemplarObserver).ObserveWithExemplar(
time.Since(now).Seconds(),
prometheus.Labels{
"traceID": spanCtx.TraceID().String(),
},
)
}
}
}
}),
}),
),
),
),
)
Expand Down
20 changes: 16 additions & 4 deletions pkg/extprom/http/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,11 @@ import (
)

type defaultMetrics struct {
requestDuration *prometheus.HistogramVec
requestSize *prometheus.SummaryVec
requestsTotal *prometheus.CounterVec
responseSize *prometheus.SummaryVec
requestDuration *prometheus.HistogramVec
requestSize *prometheus.SummaryVec
requestsTotal *prometheus.CounterVec
responseSize *prometheus.SummaryVec
inflightHTTPRequests *prometheus.GaugeVec
}

func newDefaultMetrics(reg prometheus.Registerer, buckets []float64, extraLabels []string) *defaultMetrics {
Expand All @@ -29,26 +30,37 @@ func newDefaultMetrics(reg prometheus.Registerer, buckets []float64, extraLabels
},
append([]string{"code", "handler", "method"}, extraLabels...),
),

requestSize: promauto.With(reg).NewSummaryVec(
prometheus.SummaryOpts{
Name: "http_request_size_bytes",
Help: "Tracks the size of HTTP requests.",
},
append([]string{"code", "handler", "method"}, extraLabels...),
),

requestsTotal: promauto.With(reg).NewCounterVec(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "Tracks the number of HTTP requests.",
},
append([]string{"code", "handler", "method"}, extraLabels...),
),

responseSize: promauto.With(reg).NewSummaryVec(
prometheus.SummaryOpts{
Name: "http_response_size_bytes",
Help: "Tracks the size of HTTP responses.",
},
append([]string{"code", "handler", "method"}, extraLabels...),
),

inflightHTTPRequests: promauto.With(reg).NewGaugeVec(
prometheus.GaugeOpts{
Name: "http_inflight_requests",
Help: "Current number of active requests.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could add a more specific description, something like:

Suggested change
Help: "Current number of active requests.",
Help: "Current gauge of in-flight requests.",

wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's redundant to describe it as a "gauge" in the help, because counters should have the postfix _total as per Prometheus' naming conventions. If a metric's name doesn't end with _total, it should be or behave like a gauge.

To follow up, I don't think the help should say exactly the same as the metric name. Otherwise the help is not helpful. So I thought, "maybe someone will not know exactly what we mean with 'in-flight' and 'active' is a more commonly used word".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when the metric ends with _total indeed it should indicate a counter, but if it does not end with _total not necessarily means that it is a gauge, as far as I understand (for example)
I've seen other examples where also the name of the metric repeats its type in the help message (e.g. here) but not necessarily means it should be like that.

I think would be ok keeping number - though regarding active requests for me reads a bit more unclear, since I wouldn't know what the definition of an active request would be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be a good idea to define "in-flight" within the help message? 🙂

Copy link
Contributor Author

@douglascamata douglascamata Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jessicalins the metrics in the reference you sent that aren't explicitly a gauge are summaries/histograms. They will though generate counters with the _sum and _total postfixes under the hood.

I improved the help text for the in-flight HTTP requests metrics in
7f0adc4. 👍

},
append([]string{"handler"}, extraLabels...),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the other metrics also consider a method label. Is there any reason not to collect that label here as well?

Copy link
Contributor Author

@douglascamata douglascamata Jun 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked around the codebase and found some examples of this metric and none of them had the method as a label. I also didn't find any obvious advantage in having it. Do you see a good use-case? I am open to add it if it'll be useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm interesting. I guess we could add the method label to all of them but i won't block anything, in other words won't let perfect be the enemy of good.

Why would we want that label? I guess for the same reason we would want that label on any HTTP handler metric, like all of the other metrics in this file have. The mantra in Prometheus is "instrument first, ask questions later". Said differently, better to have too many metrics than too few.

Copy link
Contributor Author

@douglascamata douglascamata Jun 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 32503f5.

I was looking into adding this label to the current code, but promhttp.InstrumentHandlerInFlight comes from prometheus/client_golang, so I wrote a special (and small) handler for this in Thanos.

),
}
}