-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bidi calls never close, leaking memory client and server #4202
Comments
And the client also has ~550 live threads with the name
And if I replace |
This is pure speculation, but I wouldn't be surprised if Netty is retaining the memory because the stream is HALF_CLOSED. http://httpwg.org/specs/rfc7540.html#rfc.section.8.1 talks about this some:
The client should probably be sending a RST_STREAM to the server to transition it to CLOSED. We'd need to look into Netty's behavior here. |
I still see the memory retained in the client when using ok-http though, which points to this being something in the common code? Sorry for going over this again, but I do still wonder if we're misunderstanding something here. This is my train of thought:
If that's all correct then this isn't a memory leak, it's expected. If it isn't, do you know where that logic breaks down? |
In HTTP (and gRPC) the bi-di stream is done once the server finishes responding. We should simply throw away anything the client tries to send past that point (which is core to our design). I would fully believe okhttp has the same bug. In fact, I'd expect okhttp to have it before netty. |
@alexnixon It looks like there is a problem in Netty clients, but I'm not seeing the same in OkHttp (with which I'm more familiar). Netty clients appear to hold onto resources for the stream in the mapping in Are the server and client running in separate processes? At some point I'd imagine you'd get an OOM on the client even if no resources are retained by the stream after completion, simply because you can create streams faster than they can complete. (especially if you're using a single threaded executor, as mentioned in comment #2). Also, I believe the large number of threads is expected when using the default executor, a cached thread pool. The loop you provided will create 1000 calls on the channel, and each of those can create a thread. At some point we'd expect some of those calls to finish and threads to be reused for new calls, but it's not clear to me that the large number of threads is necessarily related to the retaining of stream resources after the server half closes. |
@ericgribkoff I think you're right about OkHttp, that seems to be a different issue. The thousands of threads live concurrently and hold some resources, but do eventually die after a minute or so at which point the memory is free'd. The client/server are running in separate processes, though I'm not sure it's just a creation/completion race given the Netty memory leak persists after 5 minutes. Also, when I initially saw this issue in production it was with a service that slowly went OOM over a period of days. |
@alexnixon Yes, there is a genuine leak in Netty clients (I haven't looked too much at the server side yet). One difference in behavior is that the OkHttp client does send a RST_STREAM to the server when the server half-closes, whereas Netty clients do not. I hope to have a fix for this soon. |
Fantastic, thanks very much. I'll also note for now that |
#4222 contains a fix for this problem: it changes the Netty client to send a reset stream when it receives an end-of-stream from the server without having already half-closed. There are more details in the comments on #4222, but sending the reset stream frees the stream resources on the client, and receipt of the reset stream frees the stream resources on the server. Since we can't assume all clients will be updated to send the reset stream, I'll need to send out another PR to let the server releases stream resources even without this behavior change in the client. But just the client updates in #4222 alone are enough to solve this problem, as least as far as I've been able to reproduce and test it. |
Using v1.8.0.
The following client/server leaves the bidi RPC call open, holding onto resources and ultimately going OOM if enough calls are made:
Replacing the client's
ClientCall.Listener
with one that calls.halfClose()
upon completion works around the issue.The text was updated successfully, but these errors were encountered: