-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report on timeout errors #1
Comments
BTW I think the solution for the timeouts you are hitting is to make multiple That might be more complicated on the server side if you need to track state or something . However it would make it more resilent to network errors during transport as well |
First, thanks for the second set of eyes, this is super helpful. I previously was seeing both client timeouts and timeouts between the object store and server (more in-line with executor starvation the server) but totally agree that I haven't seen the object store <-> server timeout with dedicated executor, so I was likely just attributing the client timeouts to what was causing them before. Interesting idea about the multi-put calls. The major downside to that would require re-inventing most of the bulk ingest capability of the flight sql API. I also understand that gRPC is supposed to be usable for both bulk and streaming paradigms, so likely I just need to more aggressively configure the client to handle longer-lived streams? |
Yes I think if you can control the client to allow longer timeouts that would work well FWIW we found that much of the gRPC stacks (like envoy in k8s as I recall) had the same aggressive 30 second timeouts (so we had to adjust timeouts not just in rust but also in golang ones as well) |
This issue is some notes I have while looking at the code in this repr from @djanderson
TLDR I think the timeout errors you are from the gRPC CLIENT -- and basically has nothing to do with how the server is configured. I didn't see any difference in behavior related to DedicatedExecutor or not.
Background
As a background, gRPC uses http requests / responses and doesn't rely on long lived (tcp) connections. Typically gRPC clients, including tonic, have a maximum duration for any particular request. Even if the connection is actively consuming data, once the timeout is reached the client will close the connection.
Running example as is
Without other modification, I see this on my local machine:
This is the classic "client timed out" error from tonic (rust gRPC stack)
Increased client timeoutput
When I cranked up the tonic / client timeout like this:
And then ran the client, it does eventually error with an h2 error:
But looking at the server, I believe the problem is that the local stack container ran out of disk space. The server panic's like this:
But when I try to restart the server it produces an error like:
The text was updated successfully, but these errors were encountered: