-
Hello, I have a problem for months now with an application at my company. We've tried to find out what happened, If possible, I would like to have an opinion from people who're better informed than me on the subject. Or even just advice on what I could do to find out what's going on. ContextFor a bit of context, we're running our applications on a kubernetes cluster with Argo-CD as CI/CD tool. It started our application used the v2.16.0.Final of Quarkus (I don't know if this is really related to the version), The jobsThe purpose of those jobs is to stream data with gRPC directly to a database, all in reactive.
DetailsSo far, we've concluded that the problem comes from off-heap memory allocation. Example for a job:
We've investigated with Here are some examples of the results we've had:First a simple job who asks the server if he can start to sync (basically just returning a boolean)Here are the With the JMC (Java mission control) tool we get the following warning:
The pod is 256MB. We give it the options We have:
In total, we have 250MB so 6MB less than expected, but we still have the threads, the code cache, etc. which can take up a little space. For the class loading, we can see that we load up to 13000 classes, without ever unloading them. With 512MB the job passes. The metaspace is always 58MB. 14911 classes loaded, about 1911 more than the job that crashes. Another job who syncs data, it has 2Gb of memory
We've also tested the
|
Beta Was this translation helpful? Give feedback.
Replies: 8 comments 35 replies
-
Without any Idea what goes wrong 😁, i would remove the dependency on R2DBC because it's not integrated in Quarkus. R2DBC depends on a specific Netty-Version which is implicity overriden by Quarkus, this can lead to all kinds of problems. My recommondation, replace quarkus-jooq and R2DBC with plain jooq as sql generator and use the Vert.x reactive database driver, which is supported und well tested by Quarkus/Vert.x. |
Beta Was this translation helpful? Give feedback.
-
New post following this response I spent more time on this and I think I have something: it's a back-pressure problem: the server is much faster than the client. So it seems the client is storing all the "received but not processed yet" messages in the off-heap memory instead of the heap. A weird thing about this is that the client memory goes up significantly when the server sent all its messages (like a "bump" in the client container used memory) instead of going up a little bit step by step. Here's a repo to reproduce: https://github.com/jdussouillez/quarkus-high-off-heap-mem-usage
Here's some graphs to represent client side data:
With a 512MB heap inside a 1GB container (using With a 1GB heap inside a 4GB container (using More details, theories and graphs in the repo: https://github.com/jdussouillez/quarkus-high-off-heap-mem-usage |
Beta Was this translation helpful? Give feedback.
-
I guess you're still using the "old" gRPC support in Quarkus - from gRPC Java? What if you tried using the new one, on both sides - server and client?
|
Beta Was this translation helpful? Give feedback.
-
@jdussouillez Hi! As saw in #36204 there could be many reasons for RSS footprint which can cause OOM on containers. Few easy to try suggestions:
In the mentioned issue there are mentioned other investigations which should performed to find the culprit of this. |
Beta Was this translation helpful? Give feedback.
-
does this reproducer mandates to use docker to reproduce the issue ? |
Beta Was this translation helpful? Give feedback.
-
yes please,
the other client might have set a "grpc-accept-encoding" that does not
accept gzip preventing the server from using it
…On Mon, Dec 4, 2023 at 3:32 PM Junior Dussouillez ***@***.***> wrote:
do you know if message decompression was happening previously when using
the grpc-netty client ?
@vietj <https://github.com/vietj> Yes I think so (even if I'm not 100%
sure the messages were compressed between both, maybe I could have a look
with Wireshark or something like this).
My server Quarkus configuration was the same when I ran some tests with
the old gRPC client. See #36691 (reply in thread)
<#36691 (reply in thread)>
No memory bumps, and the app worked.
I will run the old client again just to be sure, and run the
async-profiler on it.
—
Reply to this email directly, view it on GitHub
<#36691 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABXDCTHODVSJ77YCFJDAM3YHXNGPAVCNFSM6AAAAAA6PNNJ46VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TONJTG44DI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
yes but quarkus uses 4.x branch
…On Mon, Dec 4, 2023 at 3:35 PM Francesco Nigro ***@***.***> wrote:
eclipse-vertx/vertx-grpc#81 let's see: is it the right branch?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
eclipse-vertx/vertx-grpc#81 let's see: is it the right branch?