Skip to content

Commit

Permalink
Allow remote retry max delay to be user configurable
Browse files Browse the repository at this point in the history
This introduces a new option `--remote_retry_max_delay` can be used to change the existing maximum exponential backoff interval used when retrying remote requests. Before this change, there was a hardcoded value controlling this maximum exponential backoff interval, set to `5s`.

Rational
`remote_retries` is useful in masking over temporary disruptions to a remote cluster. If a cluster experiences temporary downtime, it is useful to allow bazel clients to wait for a period of time for the cluster to recover before bailing and giving up. If users cannot configure the maximum exponential backoff delay, one must set a large number for `remote_retries`, each retry eventually waiting for up to 5s. This allows the bazel client to wait for a reasonable amount of time for the cluster to recover.

The problem here is that under certain cluster failure modes, requests may not be handled and failed quickly, rather they may wait until `remote_timeout` before failing. A large `remote_timeout` combined with a large `remote_retries` could lead to waiting for a very long time before finally bailing on a given action.

If a user can bump the `remote_retry_max_delay`, they can control the retry waiting semantics to their own needs.

Closes bazelbuild#16058.

PiperOrigin-RevId: 523680725
Change-Id: I21daba78b91d3157362ca85bb7b1cbbef8a94bb3
  • Loading branch information
joeljeske authored and ShreeM01 committed Apr 12, 2023
1 parent 44544ae commit db1a5ca
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@

package com.google.devtools.build.lib.remote;

import static java.lang.Math.max;

import com.google.common.annotations.VisibleForTesting;
import com.google.common.base.Preconditions;
import com.google.common.base.Throwables;
Expand Down Expand Up @@ -158,16 +160,16 @@ public static class ExponentialBackoff implements Backoff {
Preconditions.checkArgument(jitter >= 0 && jitter <= 1, "jitter must be in the range (0, 1)");
Preconditions.checkArgument(maxAttempts >= 0, "maxAttempts must be >= 0");
nextDelayMillis = initial.toMillis();
maxMillis = max.toMillis();
maxMillis = max(max.toMillis(), nextDelayMillis);
this.multiplier = multiplier;
this.jitter = jitter;
this.maxAttempts = maxAttempts;
}

public ExponentialBackoff(RemoteOptions options) {
this(
/* initial = */ Duration.ofMillis(100),
/* max = */ Duration.ofSeconds(5),
/* initial= */ Duration.ofMillis(100),
/* max= */ options.remoteRetryMaxDelay,
/* multiplier= */ 2,
/* jitter= */ 0.1,
options.remoteMaxRetryAttempts);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -362,6 +362,18 @@ public RemoteBuildEventUploadModeConverter() {
+ "If set to 0, retries are disabled.")
public int remoteMaxRetryAttempts;

@Option(
name = "remote_retry_max_delay",
defaultValue = "5s",
documentationCategory = OptionDocumentationCategory.REMOTE,
effectTags = {OptionEffectTag.UNKNOWN},
converter = RemoteDurationConverter.class,
help =
"The maximum backoff delay between remote retry attempts. Following units can be used:"
+ " Days (d), hours (h), minutes (m), seconds (s), and milliseconds (ms). If"
+ " the unit is omitted, the value is interpreted as seconds.")
public Duration remoteRetryMaxDelay;

@Option(
name = "disk_cache",
defaultValue = "null",
Expand Down

0 comments on commit db1a5ca

Please sign in to comment.