-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SLO - http request external duration and error source #989
Merged
Merged
Changes from 4 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
6f9223a
SLO - http request external duration and error source
scottlepp ac10102
fix
scottlepp 7cc7eea
lint
scottlepp d5e1e3c
test
scottlepp 85eb75a
pr suggestions
scottlepp c152402
Update experimental/slo/slo_middleware.go
scottlepp 46f17db
get name and type outside of roundtripperfunc
scottlepp e97f2bb
Merge branch 'main' into slo-duration-error-source
scottlepp 543a0d6
lint
scottlepp 5fad18f
test
scottlepp File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
package slo | ||
|
||
import ( | ||
"errors" | ||
"net/http" | ||
"syscall" | ||
"time" | ||
|
||
"github.com/grafana/grafana-plugin-sdk-go/backend" | ||
"github.com/grafana/grafana-plugin-sdk-go/backend/httpclient" | ||
"github.com/prometheus/client_golang/prometheus" | ||
"github.com/prometheus/client_golang/prometheus/promauto" | ||
) | ||
|
||
var duration = promauto.NewHistogramVec(prometheus.HistogramOpts{ | ||
Namespace: "grafana", | ||
scottlepp marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Name: "plugin_external_requests_duration", | ||
scottlepp marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Help: "Duration of requests to external services", | ||
}, []string{"plugin", "error_source"}) | ||
|
||
// Middleware captures duration of requests to external services and the source of errors | ||
func Middleware(plugin string) httpclient.Middleware { | ||
scottlepp marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return httpclient.NamedMiddlewareFunc(plugin, func(opts httpclient.Options, next http.RoundTripper) http.RoundTripper { | ||
scottlepp marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return RoundTripper(plugin, opts, next) | ||
scottlepp marked this conversation as resolved.
Show resolved
Hide resolved
|
||
}) | ||
} | ||
|
||
// RoundTripper captures duration of requests to external services and the source of errors | ||
func RoundTripper(plugin string, _ httpclient.Options, next http.RoundTripper) http.RoundTripper { | ||
return httpclient.RoundTripperFunc(func(req *http.Request) (*http.Response, error) { | ||
start := time.Now() | ||
var errorSource = "none" | ||
|
||
defer func() { | ||
duration.WithLabelValues(plugin, errorSource).Observe(time.Since(start).Seconds()) | ||
}() | ||
|
||
res, err := next.RoundTrip(req) | ||
if res != nil && res.StatusCode >= 400 { | ||
errorSource = string(backend.ErrorSourceFromHTTPStatus(res.StatusCode)) | ||
} | ||
if errors.Is(err, syscall.ECONNREFUSED) { | ||
scottlepp marked this conversation as resolved.
Show resolved
Hide resolved
|
||
errorSource = string(backend.ErrorSourceDownstream) | ||
} | ||
return res, err | ||
}) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since histogram don't you need to specify buckets? Have you verified subtracting histogram from histogram works as you imagine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I grabbed the code from here as it looked similar. I'm no expert here but from my understanding here we need a histogram, or we have to have shorter scrape times. Since that code isn't using buckets, I assumed it would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xnyo suggested subtracting the values, but that was with metrics captured from tempo which appear to be histogram. Not sure if he ever confirmed that subtraction would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But wasn't he thinking taking the full latency of a trace and subtracting downstream span latencies and then update latency histogram, e.g. it's not prometheus doing the subtraction and you get one metric.
So my question is more, given you have two different latency histogram metrics how easy is it to reliably subtract one from the other and how does that scale/perform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested if it subtracting histograms works reliably, but the way this histogram is set up should replicate what Tempo does with the metrics generator. I think @svennergr mentioned he was doing something similar with histograms in the ElasticSearch datasource as well, so he may have some guidance on whether this will be accurate or not