SLO - http request external duration and error source #989

scottlepp · 2024-05-15T16:53:14Z

What this PR does / why we need it:
Provides Middleware for Http Clients to capture external duration as a custom metric. Since we already capture total duration, we can subtract these to get plugin duration.

In discussion with @wbrowne, the API Server will expose this new metric/get's scraped into Prometheus, and we can create SLOs based on duration and error source.

Also, capture error source here as a label. Then we don't have to pass error source around.

We will take the same approach in SQLDS. So minimal changes will be required per plugin.

Ref https://github.com/grafana/enterprise-datasources/issues/730
Ref https://github.com/grafana/enterprise-datasources/issues/731

marefr

If you think this will work that's fine by me.

I've never subtracted histogram from another histogram so that's why I'm curios.

experimental/slo/slo_middleware.go

marefr · 2024-05-15T17:52:45Z

experimental/slo/slo_middleware.go

+	"github.com/prometheus/client_golang/prometheus/promauto"
+)
+
+var duration = promauto.NewHistogramVec(prometheus.HistogramOpts{


Since histogram don't you need to specify buckets? Have you verified subtracting histogram from histogram works as you imagine?

I grabbed the code from here as it looked similar. I'm no expert here but from my understanding here we need a histogram, or we have to have shorter scrape times. Since that code isn't using buckets, I assumed it would work.

@xnyo suggested subtracting the values, but that was with metrics captured from tempo which appear to be histogram. Not sure if he ever confirmed that subtraction would work.

Span metrics generate two metrics: a counter that computes requests, and a histogram that computes operation’s durations.

But wasn't he thinking taking the full latency of a trace and subtracting downstream span latencies and then update latency histogram, e.g. it's not prometheus doing the subtraction and you get one metric.

So my question is more, given you have two different latency histogram metrics how easy is it to reliably subtract one from the other and how does that scale/perform?

I haven't tested if it subtracting histograms works reliably, but the way this histogram is set up should replicate what Tempo does with the metrics generator. I think @svennergr mentioned he was doing something similar with histograms in the ElasticSearch datasource as well, so he may have some guidance on whether this will be accurate or not

experimental/slo/slo_middleware.go

marefr

LGTM feel free to evaluate. Suggested some minor changes

experimental/slo/slo_middleware.go

xnyo

Awesome work! I have some minor suggestions but the code LGTM! 🚀

experimental/slo/slo_middleware.go

xnyo · 2024-05-16T09:14:32Z

experimental/slo/slo_middleware.go

+	"github.com/prometheus/client_golang/prometheus/promauto"
+)
+
+var duration = promauto.NewHistogramVec(prometheus.HistogramOpts{


I haven't tested if it subtracting histograms works reliably, but the way this histogram is set up should replicate what Tempo does with the metrics generator. I think @svennergr mentioned he was doing something similar with histograms in the ElasticSearch datasource as well, so he may have some guidance on whether this will be accurate or not

experimental/slo/slo_middleware.go

Co-authored-by: Giuseppe Guerra <giuseppe.guerra@grafana.com>

scottlepp added 2 commits May 15, 2024 12:41

SLO - http request external duration and error source

6f9223a

fix

ac10102

scottlepp requested a review from a team as a code owner May 15, 2024 16:53

scottlepp requested review from wbrowne, marefr, andresmgot, xnyo, cletter7 and Multimo and removed request for a team May 15, 2024 16:53

scottlepp added 2 commits May 15, 2024 13:00

lint

7cc7eea

test

d5e1e3c

scottlepp requested a review from asimpson May 15, 2024 17:04

marefr approved these changes May 15, 2024

View reviewed changes

pr suggestions

85eb75a

scottlepp requested a review from marefr May 15, 2024 18:56

marefr approved these changes May 16, 2024

View reviewed changes

experimental/slo/slo_middleware.go Outdated Show resolved Hide resolved

experimental/slo/slo_middleware.go Outdated Show resolved Hide resolved

xnyo approved these changes May 16, 2024

View reviewed changes

scottlepp and others added 5 commits May 20, 2024 10:21

Update experimental/slo/slo_middleware.go

c152402

Co-authored-by: Giuseppe Guerra <giuseppe.guerra@grafana.com>

get name and type outside of roundtripperfunc

46f17db

Merge branch 'main' into slo-duration-error-source

e97f2bb

lint

543a0d6

test

5fad18f

scottlepp merged commit ee141c6 into main May 20, 2024
3 checks passed

scottlepp deleted the slo-duration-error-source branch May 20, 2024 17:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SLO - http request external duration and error source #989

SLO - http request external duration and error source #989

scottlepp commented May 15, 2024 •

edited

Loading

marefr left a comment

marefr May 15, 2024

scottlepp May 15, 2024 •

edited

Loading

scottlepp May 15, 2024 •

edited

Loading

marefr May 16, 2024

xnyo May 16, 2024

marefr left a comment

xnyo left a comment

xnyo May 16, 2024

SLO - http request external duration and error source #989

SLO - http request external duration and error source #989

Conversation

scottlepp commented May 15, 2024 • edited Loading

marefr left a comment

Choose a reason for hiding this comment

marefr May 15, 2024

Choose a reason for hiding this comment

scottlepp May 15, 2024 • edited Loading

Choose a reason for hiding this comment

scottlepp May 15, 2024 • edited Loading

Choose a reason for hiding this comment

marefr May 16, 2024

Choose a reason for hiding this comment

xnyo May 16, 2024

Choose a reason for hiding this comment

marefr left a comment

Choose a reason for hiding this comment

xnyo left a comment

Choose a reason for hiding this comment

xnyo May 16, 2024

Choose a reason for hiding this comment

scottlepp commented May 15, 2024 •

edited

Loading

scottlepp May 15, 2024 •

edited

Loading

scottlepp May 15, 2024 •

edited

Loading