-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gRPC retry behave weird when timeout is set along with retry policy #10336
Comments
What version of gRPC are you using? Since you are mentioning retries being a cause, does that mean If you remove the retryPolicy the timeout works? |
I see you saw #10314. Are you using Netty or OkHttp? |
<grpc.version>1.53.0</grpc.version> |
using Netty |
This can be easily reproduced with below config on io.grpc.examples.retrying.RetryingHelloWorldClient example. {
"methodConfig": [
{
"name": [
{
"service": "helloworld.Greeter",
"method": "SayHello"
}
],
"timeout": "2.000s",
"retryPolicy": {
"maxAttempts": 5,
"initialBackoff": "0.5s",
"maxBackoff": "30s",
"backoffMultiplier": 2,
"retryableStatusCodes": [
"UNAVAILABLE"
]
}
}
]
} Just change below method in io.grpc.examples.retrying.RetryingHelloWorldServer to return Status.UNAVAILABLE for all requests as below. @Override
public void sayHello(HelloRequest request, StreamObserver<HelloReply> responseObserver) {
int count = retryCounter.incrementAndGet();
// if (random.nextFloat() < UNAVAILABLE_PERCENTAGE) {
logger.info("Returning stubbed UNAVAILABLE error. count: " + count);
responseObserver.onError(Status.UNAVAILABLE
.withDescription("Greeter temporarily unavailable...").asRuntimeException());
// } else {
// logger.info("Returning successful Hello response, count: " + count);
// HelloReply reply = HelloReply.newBuilder().setMessage("Hello " + request.getName()).build();
// responseObserver.onNext(reply);
// responseObserver.onCompleted().
// }
} |
I've reproduced this on master. It is broken at least since 1.52.0. I didn't test further back. And yes, removing diff --git a/examples/src/main/java/io/grpc/examples/retrying/RetryingHelloWorldServer.java b/examples/src/main/java/io/grpc/examples/retrying/RetryingHelloWorldServer.java
index 165cc72ff..a22980d76 100644
--- a/examples/src/main/java/io/grpc/examples/retrying/RetryingHelloWorldServer.java
+++ b/examples/src/main/java/io/grpc/examples/retrying/RetryingHelloWorldServer.java
@@ -97,16 +97,16 @@ public class RetryingHelloWorldServer {
@Override
public void sayHello(HelloRequest request, StreamObserver<HelloReply> responseObserver) {
int count = retryCounter.incrementAndGet();
- if (random.nextFloat() < UNAVAILABLE_PERCENTAGE) {
+ //if (random.nextFloat() < UNAVAILABLE_PERCENTAGE) {
logger.info("Returning stubbed UNAVAILABLE error. count: " + count);
responseObserver.onError(Status.UNAVAILABLE
.withDescription("Greeter temporarily unavailable...").asRuntimeException());
- } else {
- logger.info("Returning successful Hello response, count: " + count);
- HelloReply reply = HelloReply.newBuilder().setMessage("Hello " + request.getName()).build();
- responseObserver.onNext(reply);
- responseObserver.onCompleted();
- }
+ //} else {
+ // logger.info("Returning successful Hello response, count: " + count);
+ // HelloReply reply = HelloReply.newBuilder().setMessage("Hello " + request.getName()).build();
+ // responseObserver.onNext(reply);
+ // responseObserver.onCompleted();
+ //}
}
}
}
diff --git a/examples/src/main/resources/io/grpc/examples/retrying/retrying_service_config.json b/examples/src/main/resources/io/grpc/examples/retrying/retrying_service_config.json
index ff115a9ad..9d202d77b 100644
--- a/examples/src/main/resources/io/grpc/examples/retrying/retrying_service_config.json
+++ b/examples/src/main/resources/io/grpc/examples/retrying/retrying_service_config.json
@@ -8,6 +8,7 @@
}
],
+ "timeout": "2.000s",
"retryPolicy": {
"maxAttempts": 5,
"initialBackoff": "0.5s", |
As commented in the closed issue, we had the same problem, and I'm pretty sure the problem was introduced between versions 1.51.3 and 1.52.0. |
@turchenkoalex did some detailed investigation in #10838, which I think is related to this. @larry-safran, please look at the text in #10838. I didn't read it carefully but the things it mentioned were highly likely to be related to this issue, so I'm hopeful the root cause has been found. |
The solution proposed in #10838 fixed this. The root cause hypothesis was valid. |
…#10884) * Fix retries that timeout hanging forever. (grpc#10855) Fixes grpc#10336
Do you have any ETA when this fix will be released in 1.59.x and 1.60.x? |
There was a defect in this fix that was corrected in #11026. Is there a reason why you need 1.59 or 1.60? |
Thanks for your response and the pointer to the follow-up fix! #11026 has not been backported to 1.59.x and 1.60.x AFAICT. Do you have plans to do this and perhaps release these versions? |
Per the support policy we wouldn't ordinarily do releases for 1.59 and 1.60. Doing backports are cheap and easy, so we will sometimes backport to branches that we don't know if we'll ever release, so that if there happens to be a release for some reason it would be included. I'm not quite sure why larry-safran backported to that many branches though. To be clear, you need both 1.59 and 1.60? Just one or the other wouldn't be enough? Is there anything we can do to help with the okio version conflicts? It might be good to open a new issue for that to discuss if there are options. Our Bazel build, for example, is still using 2.10.0. |
Using below retry config:
Expected: grpc client should stop retrying after "timeout": "2.000s" is over but client keeps running for infinite time i.e. kind of stuck in gRPC call to server.
The text was updated successfully, but these errors were encountered: