Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider offering a full jitter backoff policy #8755

Closed
dbolduc opened this issue Apr 14, 2022 · 6 comments
Closed

Consider offering a full jitter backoff policy #8755

dbolduc opened this issue Apr 14, 2022 · 6 comments
Assignees
Labels
type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@dbolduc
Copy link
Member

dbolduc commented Apr 14, 2022

Here is a write up on full jitter. Apparently the java clients do this.

With pure exponential backoff policy we might have backoff ranges like:
req1: (.5, 1)
req2: (1, 2)
req3: (2, 4)
....

With full jitter, these backoff ranges would look like:
req1: (0, 1)
req2: (0, 2)
req3: (0, 4)
....

(make up your own units)

@dbolduc dbolduc added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Apr 14, 2022
@coryan
Copy link
Contributor

coryan commented Apr 15, 2022

For some services (storage comes to mind) a minimum back off is recommended.

@coryan
Copy link
Contributor

coryan commented Sep 8, 2022

@dbolduc is going to research some more and make a recommendation.

@dbolduc
Copy link
Member Author

dbolduc commented Sep 12, 2022

This is the code to run the simulation in the article: https://github.com/aws-samples/aws-arch-backoff-simulator/blob/master/src/backoff_simulator.py

I added our backoff algorithm:

34a35,39
> class ExpoBackoffCloudCpp(Backoff):
>     def backoff(self, n):
>         v = self.expo(n)
>         return random.uniform(v/2, v)
> 

And a MinJitter (i.e. full jitter, but it doesn't start at 0)

39a45,49
> class ExpoBackoffMinJitter(Backoff):
>     def backoff(self, n):
>         v = self.expo(n)
>         return random.uniform(self.base, v)
> 

Results:
image

image

To summarize (with these exact settings, in this exact model), min jitter is indistinguishable from full jitter. Our strategy is indistinguishable from the jitter strategies in terms of total calls, but takes more time to complete the work.

I will phone a friend before making a recommendation...

@dbolduc
Copy link
Member Author

dbolduc commented Mar 22, 2023

Seems like other client library languages do full jitter only. Given the supposed, slight performance benefit, I think we should implement min-jitter (which is like full jitter, but slightly more general).

I think we should modify the implementation of the existing ExponentialBackoffPolicy, with minimal behavior changes for the current constructor.

I think our default backoff policy should use full jitter.

Background

Our API accepts three parameters:

  • minimum delay
  • maximum delay (Note that this is an overall maximum.)
  • scaling factor

template <typename Rep1, typename Period1, typename Rep2, typename Period2>
ExponentialBackoffPolicy(std::chrono::duration<Rep1, Period1> initial_delay,
std::chrono::duration<Rep2, Period2> maximum_delay,
double scaling)

Aside: I don't agree with the range it sets. I think we should multiply by scaling, not 2. Whatever.

current_delay_range_(2 * initial_delay_),

Min-jitter requires four parameters:

  • minimum delay (0 in the case of full jitter)
  • initial delay upper bound
  • maximum delay
  • scaling factor

Design / Work:

I would break up the work into two PRs:

1. Implement min-jitter

  • add a new constructor + a minimum_delay_ member.
    • I'd probably rename s/current_range_delay_/current_range_upper_bound_/, too.
  • map the parameters for the existing ctor to the members. (requires thinking!)
  • update clone() and OnCompletion() implementations.
  • add/update tests
  • update documentation

2. Update library defaults

We want the RetryPolicyOption to use a similar policy with a min of ms(0).

We want the value of the PollingPolicyOption to be unchanged, so we should set it manually instead of using the value of the RetryPolicyOption)

"Running the generator" means:

ci/cloudbuild/build.sh -t generate-libraries-pr

It may also be useful to generate only the "golden" files. (instead of 100 libraries). This just speeds up development cycles.

env GENERATE_GOLDEN_ONLY=1 ci/cloudbuild/build.sh -t generate-libraries-pr

alevenberg added a commit to alevenberg/google-cloud-cpp that referenced this issue May 9, 2023
 - Add a test for floating point numbers
 - Clarify the naming of current_delay_range_ by adding two parameters (one for the start and one for the end). In the long term we can remove this since we want to implement min jitter in issue googleapis#8755. Then the current_delay_start_ will always equal the initial_delay_.
alevenberg added a commit to alevenberg/google-cloud-cpp that referenced this issue May 9, 2023
 - Use () around min and max to avoid macro expansion
 - Add a test for floating point numbers
 - Clarify the naming of current_delay_range_ by adding two parameters (one for the start and one for the end). In the long term we can remove this since we want to implement min jitter in issue googleapis#8755. Then the current_delay_start_ will always equal the initial_delay_.
@dbolduc dbolduc assigned alevenberg and unassigned dbolduc May 11, 2023
@coryan
Copy link
Contributor

coryan commented Jun 9, 2023

Can we close this?

@alevenberg
Copy link
Member

Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

3 participants