Original vs hedged request metrics #46

cristaloleg · 2023-08-09T08:35:44Z

Fixes #42

dannykopping

Exactly how I was planning to implement it 🙂
LGTM!

cristaloleg · 2023-08-09T10:02:52Z

Thanks, then I will add tests to cover this case and release a new version soon.

dannykopping · 2023-08-09T10:04:18Z

Really appreciate it @cristaloleg 🙌

hedged.go

cristaloleg · 2023-08-09T19:46:48Z

Tests are added @dannykopping @joe-elliott . Not sure about thread above, IMO having 2 counters is much simpler and costs nothing, WDYT ?

I will refactor tests in the next PR, looks too wordy now.

hedged_test.go

examples_test.go

dannykopping · 2023-08-10T07:29:24Z

🎉🎉🎉

dannykopping · 2023-08-17T11:53:30Z

I tried to integrate this, but long story short we have many hedged clients but only register the metrics once; this makes it difficult to use the stats because one client might have 0 for HedgedRequestWins() and another might have some other value - which one would we use to update the metrics? The answer is: without major refactoring, we can't.

I solved this by using the return value of the roundtrip:

func (rt *limitedHedgingRoundTripper) RoundTrip(req *http.Request) (*http.Response, error) {
	isHedged := hedgedhttp.IsHedgedRequest(req)
	if isHedged {
		if !rt.limiter.Allow() {
			totalRateLimitedHedgeRequests.Inc()
			return nil, ErrTooManyHedgeRequests
		}
		totalHedgeRequests.Inc()
	}

	resp, err := rt.next.RoundTrip(req)
	if err == nil {
		if isHedged {
			requestsWon.WithLabelValues("hedged").Inc()
		} else {
			requestsWon.WithLabelValues("original").Inc()
		}
	}

	return resp, err
}

Edit: here's the PR grafana/loki#10281

cristaloleg · 2023-08-17T12:50:56Z

Thanks for the update! So...as I understood it's not library's problem but only a case that you've many hedged clients in 1 app. Am I right?

BTW, any numbers how many wons? :D (such info is probably under NDA but I decided to try :D )

dannykopping · 2023-08-17T13:10:29Z

So...as I understood it's not library's problem but only a case that you've many hedged clients in 1 app. Am I right?

That's correct!

BTW, any numbers how many wons? :D (such info is probably under NDA but I decided to try :D )

I'll update you once I roll this out to production next week 🙂 I'm sure I can send a % of hedging effectiveness with no problems.

dannykopping · 2023-08-18T07:37:15Z

This is from a 12 hour range in our pre-prod environment. Seems like we're averaging about a 20% effectiveness rate (i.e. only 1 of every 5 hedged requests wins vs the original).

Do you have a good intuition about what this rate should be in order to justify the extra expense?

cristaloleg · 2023-08-18T08:22:39Z

Do you have a good intuition about what this rate should be in order to justify the extra expense?

Sorry, no real formulas, someone with a better probability theory background should comment on that.

But the intuition suggests that it should not be high (or even lower than that). Request hedging is only about tail latency, so basically 1-5% of the requests.

I don't think there is are exact numbers for everyone. Sleep between calls, amount of hedged calls and success rate as a result heavily depends on the systems and it's behaviour. The only thing I can really suggest is to play with the numbers.

Probably the target latency should be a const (so, SLO) and after that tweaking sleeps & amount should be adjusted to minimise numbers of calls but keeping latency at the desired level. My 2c.

CC: @storozhukBM

Original vs hedged request metrics

20388a6

cristaloleg mentioned this pull request Aug 9, 2023

Determining the effectiveness of hedging #42

Closed

dannykopping approved these changes Aug 9, 2023

View reviewed changes

joe-elliott reviewed Aug 9, 2023

View reviewed changes

hedged.go Show resolved Hide resolved

add tests

2ae41df

dannykopping reviewed Aug 9, 2023

View reviewed changes

hedged_test.go Show resolved Hide resolved

hedged_test.go Show resolved Hide resolved

cristaloleg added 3 commits August 10, 2023 08:44

fix

a54c044

fix

e4a4fff

fix

518434c

cristaloleg commented Aug 10, 2023

View reviewed changes

examples_test.go Show resolved Hide resolved

cristaloleg merged commit 6504b04 into main Aug 10, 2023

cristaloleg deleted the orig-vs-hedg-metric branch August 10, 2023 07:18

dannykopping mentioned this pull request Aug 17, 2023

Track effectiveness of hedged requests grafana/loki#10281

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Original vs hedged request metrics #46

Original vs hedged request metrics #46

cristaloleg commented Aug 9, 2023

dannykopping left a comment

cristaloleg commented Aug 9, 2023

dannykopping commented Aug 9, 2023

cristaloleg commented Aug 9, 2023

dannykopping commented Aug 10, 2023

dannykopping commented Aug 17, 2023 •

edited

Loading

cristaloleg commented Aug 17, 2023

dannykopping commented Aug 17, 2023

dannykopping commented Aug 18, 2023

cristaloleg commented Aug 18, 2023

Original vs hedged request metrics #46

Original vs hedged request metrics #46

Conversation

cristaloleg commented Aug 9, 2023

dannykopping left a comment

Choose a reason for hiding this comment

cristaloleg commented Aug 9, 2023

dannykopping commented Aug 9, 2023

cristaloleg commented Aug 9, 2023

dannykopping commented Aug 10, 2023

dannykopping commented Aug 17, 2023 • edited Loading

cristaloleg commented Aug 17, 2023

dannykopping commented Aug 17, 2023

dannykopping commented Aug 18, 2023

cristaloleg commented Aug 18, 2023

dannykopping commented Aug 17, 2023 •

edited

Loading