gRPC retry behave weird when timeout is set along with retry policy #10336

ranavivek04 · 2023-06-30T10:37:15Z

Using below retry config:

{
  "methodConfig": [
    {
      "name": [
        {
          "service": "helloworld.Greeter",
          "method": "SayHello"
        }
      ],
      "timeout": "2.000s",
      "retryPolicy": {
        "maxAttempts": 5,
        "initialBackoff": "0.5s",
        "maxBackoff": "30s",
        "backoffMultiplier": 2,
        "retryableStatusCodes": [
          "UNAVAILABLE"
        ]
      }
    }
  ]
}

Expected: grpc client should stop retrying after "timeout": "2.000s" is over but client keeps running for infinite time i.e. kind of stuck in gRPC call to server.

ejona86 · 2023-06-30T15:05:11Z

What version of gRPC are you using? Since you are mentioning retries being a cause, does that mean If you remove the retryPolicy the timeout works?

ejona86 · 2023-06-30T15:23:33Z

I see you saw #10314. Are you using Netty or OkHttp?

ranavivek04 · 2023-06-30T15:49:16Z

What version of gRPC are you using? Since you are mentioning retries being a cause, does that mean If you remove the retryPolicy the timeout works?

<grpc.version>1.53.0</grpc.version>
Retry works fine when I either remove timeout or I set timeout to a value that is greater than the combined time of retry attempts.

ranavivek04 · 2023-06-30T15:49:50Z

I see you saw #10314. Are you using Netty or OkHttp?

using Netty

ranavivek04 · 2023-06-30T15:54:13Z

This can be easily reproduced with below config on io.grpc.examples.retrying.RetryingHelloWorldClient example.

{
  "methodConfig": [
    {
      "name": [
        {
          "service": "helloworld.Greeter",
          "method": "SayHello"
        }
      ],
      "timeout": "2.000s",
      "retryPolicy": {
        "maxAttempts": 5,
        "initialBackoff": "0.5s",
        "maxBackoff": "30s",
        "backoffMultiplier": 2,
        "retryableStatusCodes": [
          "UNAVAILABLE"
        ]
      }
    }
  ]
}

Just change below method in io.grpc.examples.retrying.RetryingHelloWorldServer to return Status.UNAVAILABLE for all requests as below.

 @Override
    public void sayHello(HelloRequest request, StreamObserver<HelloReply> responseObserver) {
      int count = retryCounter.incrementAndGet();
//      if (random.nextFloat() < UNAVAILABLE_PERCENTAGE) {
        logger.info("Returning stubbed UNAVAILABLE error. count: " + count);
        responseObserver.onError(Status.UNAVAILABLE
            .withDescription("Greeter temporarily unavailable...").asRuntimeException());
//      } else {
//        logger.info("Returning successful Hello response, count: " + count);
//        HelloReply reply = HelloReply.newBuilder().setMessage("Hello " + request.getName()).build();
//        responseObserver.onNext(reply);
//        responseObserver.onCompleted().
//      }
    }

ejona86 · 2023-06-30T16:26:41Z

I've reproduced this on master. It is broken at least since 1.52.0. I didn't test further back. And yes, removing retryPolicy fixes timeout.

diff --git a/examples/src/main/java/io/grpc/examples/retrying/RetryingHelloWorldServer.java b/examples/src/main/java/io/grpc/examples/retrying/RetryingHelloWorldServer.java
index 165cc72ff..a22980d76 100644
--- a/examples/src/main/java/io/grpc/examples/retrying/RetryingHelloWorldServer.java
+++ b/examples/src/main/java/io/grpc/examples/retrying/RetryingHelloWorldServer.java
@@ -97,16 +97,16 @@ public class RetryingHelloWorldServer {
     @Override
     public void sayHello(HelloRequest request, StreamObserver<HelloReply> responseObserver) {
       int count = retryCounter.incrementAndGet();
-      if (random.nextFloat() < UNAVAILABLE_PERCENTAGE) {
+      //if (random.nextFloat() < UNAVAILABLE_PERCENTAGE) {
         logger.info("Returning stubbed UNAVAILABLE error. count: " + count);
         responseObserver.onError(Status.UNAVAILABLE
             .withDescription("Greeter temporarily unavailable...").asRuntimeException());
-      } else {
-        logger.info("Returning successful Hello response, count: " + count);
-        HelloReply reply = HelloReply.newBuilder().setMessage("Hello " + request.getName()).build();
-        responseObserver.onNext(reply);
-        responseObserver.onCompleted();
-      }
+      //} else {
+      //  logger.info("Returning successful Hello response, count: " + count);
+      //  HelloReply reply = HelloReply.newBuilder().setMessage("Hello " + request.getName()).build();
+      //  responseObserver.onNext(reply);
+      //  responseObserver.onCompleted();
+      //}
     }
   }
 }
diff --git a/examples/src/main/resources/io/grpc/examples/retrying/retrying_service_config.json b/examples/src/main/resources/io/grpc/examples/retrying/retrying_service_config.json
index ff115a9ad..9d202d77b 100644
--- a/examples/src/main/resources/io/grpc/examples/retrying/retrying_service_config.json
+++ b/examples/src/main/resources/io/grpc/examples/retrying/retrying_service_config.json
@@ -8,6 +8,7 @@
         }
       ],
 
+      "timeout": "2.000s",
       "retryPolicy": {
         "maxAttempts": 5,
         "initialBackoff": "0.5s",

burk · 2023-09-29T07:38:29Z

As commented in the closed issue, we had the same problem, and I'm pretty sure the problem was introduced between versions 1.51.3 and 1.52.0.

ejona86 · 2024-01-19T21:26:25Z

@turchenkoalex did some detailed investigation in #10838, which I think is related to this. @larry-safran, please look at the text in #10838. I didn't read it carefully but the things it mentioned were highly likely to be related to this issue, so I'm hopeful the root cause has been found.

Fixes grpc#10336

larry-safran · 2024-01-27T03:03:57Z

The solution proposed in #10838 fixed this. The root cause hypothesis was valid.

Fixes grpc#10336

Fixes #10336

Fixes grpc#10336

* Fix retries that timeout hanging forever. (#10855) Fixes #10336 * Fix flaky retry tests (#10887) * Reorder tracing and actually closing listener to eliminate test flakiness * Use real value rather than mock for flaky test

* Fix retries that timeout hanging forever. (#10855) Fixes #10336

* Fix retries that timeout hanging forever. (#10855) Fixes #10336 * Fix flaky retry tests (#10887)

…#10884) * Fix retries that timeout hanging forever. (grpc#10855) Fixes grpc#10336

31KM · 2024-03-27T10:28:26Z

Do you have any ETA when this fix will be released in 1.59.x and 1.60.x?

larry-safran · 2024-03-27T18:19:08Z

There was a defect in this fix that was corrected in #11026. Is there a reason why you need 1.59 or 1.60?

31KM · 2024-03-28T09:36:03Z

Thanks for your response and the pointer to the follow-up fix!
We are stuck on older versions because of the (kinda insane) update policy of one of our customers and also some version conflicts (grpc -> okhttp -> okio -> kotlin). We are in the automotive industry for what it's worth.

#11026 has not been backported to 1.59.x and 1.60.x AFAICT. Do you have plans to do this and perhaps release these versions?

ejona86 · 2024-03-28T15:44:15Z

Per the support policy we wouldn't ordinarily do releases for 1.59 and 1.60. Doing backports are cheap and easy, so we will sometimes backport to branches that we don't know if we'll ever release, so that if there happens to be a release for some reason it would be included. I'm not quite sure why larry-safran backported to that many branches though.

To be clear, you need both 1.59 and 1.60? Just one or the other wouldn't be enough?

Is there anything we can do to help with the okio version conflicts? It might be good to open a new issue for that to discuss if there are options. Our Bazel build, for example, is still using 2.10.0.

ranavivek04 added the question label Jun 30, 2023

ejona86 added bug and removed question labels Jun 30, 2023

ejona86 assigned larry-safran Jun 30, 2023

ejona86 added this to the Next milestone Jun 30, 2023

burk mentioned this issue Sep 28, 2023

.withDeadlineAfter(..) not working for us in 1.52.0 and above #10579

Closed

ejona86 mentioned this issue Jan 19, 2024

blockingUnaryCall hangs forever if retries are enabled #10838

Closed

larry-safran added a commit to larry-safran/grpc-java that referenced this issue Jan 27, 2024

Fix retries that timeout hanging forever.

50fbd32

Fixes grpc#10336

larry-safran mentioned this issue Jan 27, 2024

Fix retries that timeout hanging forever. #10855

Merged

larry-safran added a commit to larry-safran/grpc-java that referenced this issue Jan 31, 2024

Fix retries that timeout hanging forever.

3130739

Fixes grpc#10336

larry-safran added a commit that referenced this issue Jan 31, 2024

Fix retries that timeout hanging forever. (#10855)

c0a9d31

Fixes #10336

larry-safran closed this as completed in #10855 Jan 31, 2024

larry-safran added a commit to larry-safran/grpc-java that referenced this issue Feb 1, 2024

Fix retries that timeout hanging forever. (grpc#10855)

880def8

Fixes grpc#10336

larry-safran added a commit to larry-safran/grpc-java that referenced this issue Feb 1, 2024

Fix retries that timeout hanging forever. (grpc#10855)

5eeafd9

Fixes grpc#10336

larry-safran added a commit to larry-safran/grpc-java that referenced this issue Feb 1, 2024

Fix retries that timeout hanging forever. (grpc#10855)

eea346c

Fixes grpc#10336

larry-safran added a commit to larry-safran/grpc-java that referenced this issue Feb 1, 2024

Fix retries that timeout hanging forever. (grpc#10855)

98dd3a8

Fixes grpc#10336

larry-safran added a commit to larry-safran/grpc-java that referenced this issue Feb 1, 2024

Fix retries that timeout hanging forever. (grpc#10855)

726c15a

Fixes grpc#10336

larry-safran added a commit to larry-safran/grpc-java that referenced this issue Feb 1, 2024

Fix retries that timeout hanging forever. (grpc#10855)

8832d37

Fixes grpc#10336

larry-safran added a commit to larry-safran/grpc-java that referenced this issue Feb 1, 2024

Fix retries that timeout hanging forever. (grpc#10855)

ace0dcd

Fixes grpc#10336

larry-safran added a commit to larry-safran/grpc-java that referenced this issue Feb 1, 2024

Fix retries that timeout hanging forever. (grpc#10855)

1c735b5

Fixes grpc#10336

larry-safran added a commit to larry-safran/grpc-java that referenced this issue Feb 1, 2024

Fix retries that timeout hanging forever. (grpc#10855)

bbc4051

Fixes grpc#10336

larry-safran added a commit to larry-safran/grpc-java that referenced this issue Feb 1, 2024

Fix retries that timeout hanging forever. (grpc#10855)

a2adf04

Fixes grpc#10336

larry-safran added a commit that referenced this issue Feb 6, 2024

Fix retries that timeout hanging forever. [ Backport to 1.59.x] (#10884)

b05c9f7

* Fix retries that timeout hanging forever. (#10855) Fixes #10336

larry-safran added a commit that referenced this issue Feb 6, 2024

Fix retries that timeout hanging forever. [ Backport to 1.60.x] (#10885)

b8cd82c

* Fix retries that timeout hanging forever. (#10855) Fixes #10336 * Fix flaky retry tests (#10887)

larry-safran added a commit to larry-safran/grpc-java that referenced this issue Feb 7, 2024

Fix retries that timeout hanging forever. [ Backport to 1.59.x] (grpc…

7c6849d

…#10884) * Fix retries that timeout hanging forever. (grpc#10855) Fixes grpc#10336

ejona86 modified the milestones: Next, 1.63 Feb 9, 2024

github-actions bot locked as resolved and limited conversation to collaborators Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gRPC retry behave weird when timeout is set along with retry policy #10336

gRPC retry behave weird when timeout is set along with retry policy #10336

ranavivek04 commented Jun 30, 2023 •

edited by ejona86

Loading

ejona86 commented Jun 30, 2023

ejona86 commented Jun 30, 2023

ranavivek04 commented Jun 30, 2023

ranavivek04 commented Jun 30, 2023

ranavivek04 commented Jun 30, 2023 •

edited by ejona86

Loading

ejona86 commented Jun 30, 2023 •

edited

Loading

burk commented Sep 29, 2023

ejona86 commented Jan 19, 2024

larry-safran commented Jan 27, 2024

31KM commented Mar 27, 2024

larry-safran commented Mar 27, 2024

31KM commented Mar 28, 2024

ejona86 commented Mar 28, 2024

gRPC retry behave weird when timeout is set along with retry policy #10336

gRPC retry behave weird when timeout is set along with retry policy #10336

Comments

ranavivek04 commented Jun 30, 2023 • edited by ejona86 Loading

ejona86 commented Jun 30, 2023

ejona86 commented Jun 30, 2023

ranavivek04 commented Jun 30, 2023

ranavivek04 commented Jun 30, 2023

ranavivek04 commented Jun 30, 2023 • edited by ejona86 Loading

ejona86 commented Jun 30, 2023 • edited Loading

burk commented Sep 29, 2023

ejona86 commented Jan 19, 2024

larry-safran commented Jan 27, 2024

31KM commented Mar 27, 2024

larry-safran commented Mar 27, 2024

31KM commented Mar 28, 2024

ejona86 commented Mar 28, 2024

ranavivek04 commented Jun 30, 2023 •

edited by ejona86

Loading

ranavivek04 commented Jun 30, 2023 •

edited by ejona86

Loading

ejona86 commented Jun 30, 2023 •

edited

Loading