Suspected memory leak in off-heap memory #36691

Sirz3chs · 2023-10-25T13:48:12Z

Sirz3chs
Oct 25, 2023

Hello, I have a problem for months now with an application at my company. We've tried to find out what happened,
did many investigations on heap and unsafe memory, but nothing really stands out. We suspect a memory leak in the off-heap memory.

If possible, I would like to have an opinion from people who're better informed than me on the subject. Or even just advice on what I could do to find out what's going on.

Context

For a bit of context, we're running our applications on a kubernetes cluster with Argo-CD as CI/CD tool.
We're also using Argo-Workflows to schedule different jobs. My problem concerns many of those jobs, all with the same problem.
They're used to sync data between two services using gRPC client/server, and the problem only appears on gRPC client side.

It started our application used the v2.16.0.Final of Quarkus (I don't know if this is really related to the version),
we're following the release rigorously and the application now uses the v3.4.3.
We are not currently building the application with native mode, but we would like to do so.

The jobs

The purpose of those jobs is to stream data with gRPC directly to a database, all in reactive.
It's just a small Quarkus app with Picocli to call the different jobs. The main dependencies we're using are:

quarkus-picocli for cli args
quarkus-grpc
quarkus-smallrye-stork to use gRPC service discovery
quarkus-jooq with R2DBC for database interaction
quarkus-container-jib to build OCI images

Details

So far, we've concluded that the problem comes from off-heap memory allocation.

Example for a job:

The job needs 512m of memory to run, plus we're limiting the JVM to use 75% of the memory (-XX:MaxRAMPercentage=75.0 and -XX:MinRAMPercentage=75.0).
By launching the job in java jar locally, everything goes well because even by limiting the JVM the off-heap memory uses the memory available on the machine (32GB RAM).
On the other hand, by launching in a docker container or in a kubernetes pod, always with the same memory limit, the container ends up OOMKilled because the memory is limited by cgroups.

We've investigated with jcmd, async-profiler and many other tools, but we can't find anything obvious.
We also tried to set io.netty.leakDetection.level=advanced or io.netty.noUnsafe=true options to see if there was a leak in netty, but nothing.

Here are some examples of the results we've had:

First a simple job who asks the server if he can start to sync (basically just returning a boolean)

Here are the jfr files customer-check-oomkilled.zip generated from a pod running in kubernetes cluster.

With the JMC (Java mission control) tool we get the following warning:

100 - Metaspace Live Set Trend
The class data seems to increase constantly in the metaspace during the recording.
This behavior may indicate a memory leak in the metaspace, this could be due to the application not unloading classes as needed.

The pod is 256MB. We give it the options -XX:InitialRAMPercentage=50.0 -XX:MaxRAMPercentage=75.0 -XX:MinRAMPercentage=75.0 which means that we give it 75% for the heap, i.e. 192MB max. The rest is for off heap memory, so 64MB.

We have:

A heap reserved at 192MB, with a current size of 128MB, and used at most 64MB (max ~25MB after GC)
The metaspace goes up to 58MB

In total, we have 250MB so 6MB less than expected, but we still have the threads, the code cache, etc. which can take up a little space.

For the class loading, we can see that we load up to 13000 classes, without ever unloading them.

With 512MB the job passes. The metaspace is always 58MB. 14911 classes loaded, about 1911 more than the job that crashes.
customer-check-ok-512.zip

Another job who syncs data, it has 2Gb of memory

product-sync-ok-2048.zip

jcmd result:

Native Memory Tracking:

(Omitting categories weighting less than 1KB)

Total: reserved=3239577KB, committed=1858765KB
       malloc: 88597KB #635691
       mmap:   reserved=3150980KB, committed=1770168KB

-                 Java Heap (reserved=1572864KB, committed=1550336KB)
                            (mmap: reserved=1572864KB, committed=1550336KB) 
 
-                     Class (reserved=1050400KB, committed=12896KB)
                            (classes #16609)
                            (  instance classes #15638, array classes #971)
                            (malloc=1824KB #36870) 
                            (mmap: reserved=1048576KB, committed=11072KB) 
                            (  Metadata:   )
                            (    reserved=131072KB, committed=80832KB)
                            (    used=80494KB)
                            (    waste=338KB =0.42%)
                            (  Class space:)
                            (    reserved=1048576KB, committed=11072KB)
                            (    used=10792KB)
                            (    waste=280KB =2.53%)
 
-                    Thread (reserved=43124KB, committed=3600KB)
                            (thread #42)
                            (stack: reserved=43008KB, committed=3484KB)
                            (malloc=69KB #255) 
                            (arena=47KB #82)
 
-                      Code (reserved=249125KB, committed=24597KB)
                            (malloc=1437KB #8902) 
                            (mmap: reserved=247688KB, committed=23160KB) 
 
-                        GC (reserved=102525KB, committed=101689KB)
                            (malloc=11181KB #18974) 
                            (mmap: reserved=91344KB, committed=90508KB) 
 
-                  Compiler (reserved=315KB, committed=315KB)
                            (malloc=150KB #661) 
                            (arena=165KB #5)
 
-                  Internal (reserved=726KB, committed=726KB)
                            (malloc=690KB #15054) 
                            (mmap: reserved=36KB, committed=36KB) 
 
-                     Other (reserved=11529KB, committed=11529KB)
                            (malloc=11529KB #196) 
 
-                    Symbol (reserved=23788KB, committed=23788KB)
                            (malloc=22469KB #523962) 
                            (arena=1319KB #1)
 
-    Native Memory Tracking (reserved=10266KB, committed=10266KB)
                            (malloc=334KB #4771) 
                            (tracking overhead=9933KB)
 
-        Shared class space (reserved=16384KB, committed=10732KB)
                            (mmap: reserved=16384KB, committed=10732KB) 
 
-               Arena Chunk (reserved=402KB, committed=402KB)
                            (malloc=402KB) 
 
-                   Tracing (reserved=26378KB, committed=26378KB)
                            (malloc=26346KB #22500) 
                            (arena=32KB #1)
 
-                    Module (reserved=241KB, committed=241KB)
                            (malloc=241KB #2090) 
 
-                 Safepoint (reserved=8KB, committed=8KB)
                            (mmap: reserved=8KB, committed=8KB) 
 
-           Synchronization (reserved=84KB, committed=84KB)
                            (malloc=84KB #948) 
 
-            Serviceability (reserved=1KB, committed=1KB)
                            (malloc=1KB #12) 
 
-                 Metaspace (reserved=131378KB, committed=81138KB)
                            (malloc=306KB #152) 
                            (mmap: reserved=131072KB, committed=80832KB) 
 
-      String Deduplication (reserved=1KB, committed=1KB)
                            (malloc=1KB #8) 
 
-           Object Monitors (reserved=37KB, committed=37KB)
                            (malloc=37KB #182)

We've also tested the `nativemem` experimental profile of `async-profiler`

Following this issue async-profiler/async-profiler#491, with the documentation here

The results are the following:

But I don't really see if something goes wrong.

I'm sorry for this long post but after months of research, I don't know what to do anymore. If you have any ideas, advice, tools to use, I'm interested.

Answered by franz1981

Dec 4, 2023

eclipse-vertx/vertx-grpc#81 let's see: is it the right branch?

View full answer

@geoand · 2023-10-25T13:48:15Z

quarkus-bot[bot]
bot Oct 25, 2023

/cc @geoand (jib,kubernetes), @iocanel (kubernetes)

0 replies

mschorsch · 2023-10-25T15:54:08Z

mschorsch
Oct 25, 2023

Without any Idea what goes wrong 😁, i would remove the dependency on R2DBC because it's not integrated in Quarkus. R2DBC depends on a specific Netty-Version which is implicity overriden by Quarkus, this can lead to all kinds of problems.

My recommondation, replace quarkus-jooq and R2DBC with plain jooq as sql generator and use the Vert.x reactive database driver, which is supported und well tested by Quarkus/Vert.x.

https://quarkus.io/guides/reactive-sql-clients

https://www.jooq.org/doc/latest/manual/getting-started/use-cases/jooq-as-a-sql-builder-without-codegeneration/

8 replies

Sirz3chs Oct 30, 2023
Author

@mschorsch My colleagues work in scrum so they will only do the correction at the next sprint (around next week from now) as currently our jobs passes with a lot of memory. I will certainly come back to you as soon as I have answers.

jdussouillez Nov 22, 2023

I'm @Sirz3chs's coworker

@mschorsch

TLDR: Removing Quarkus-jOOQ and R2DBC to use Vert.x reactive SQL client did not fix the memory issues we have. ~~These memory issues seems to happen when jobs are running on Kubernetes cluster only (cannot reproduce the bug in a localhost Docker container).~~ Edit: 2023-12-01: check this response

Sorry for the late update, it has been a hell of a ride to migrate our apps. I removed quarkus-jooq and R2DBC (but kept jOOQ with code generation as a "query generator" to simplify the migration). I see no differences in memory consumption.
Note: Instead of using InitialRAMPercentage/MaxRAMPercentage/MinRAMPercentage, I set a fixed heap memory manually based on what the jobs need when running in localhost. And then I changed the pod memory until the job run without errors. Giving them somethimes more than 2x the heap memory.

My jobs are very simple: read data from a gRPC stream (application "A") and insert/update them in the db of application "B" (apps are on different K8s namespaces, so might be on different nodes, AWS regions, etc)

For small jobs (only 50 or 100 entities to sync), I need a 64MB heap but a K8S pod of 256MB minimum to run my job. I'm OK with that, nothing weird (I know the JVM needs a bit of off-heap to run).
But for other jobs (doing the same thing with much more data, millions of entities with ~5-10 Java fields by entity) the off-heap memory required is becoming insane! For some jobs, I have a heap of 512MB but I receive OOMKilled errors with a pod of 3072MB (after dozens of retry, it's not a one-time error)!
So I suspect a memory leak or something using way too much off-heap memory.
Note that the JAR file/OCI image is the same for all jobs, small or big. The only changes are the Main class parameters.

Quarkus features: Installed features: [agroal, amazon-s3, cache, cdi, grpc-client, hibernate-validator, jdbc-postgresql, kubernetes-client, liquibase, micrometer, narayana-jta, oidc, oidc-client, oidc-client-reactive-filter, picocli, reactive-pg-client, rest-client-reactive, rest-client-reactive-jackson, scheduler, security, smallrye-context-propagation, smallrye-fault-tolerance, smallrye-health, smallrye-openapi, vertx]

ArgoWorkflow job definition:

- name: sync-app-ref
  metadata:
    labels:
      app.kubernetes.io/name: myproject-sync
      command: app-ref-sync
  serviceAccountName: myproject
  container:
    image: myproject-sync
    args:
      - app-ref-sync
    envFrom:
      - configMapRef:
          name: myproject-conf
    env:
      - name: JAVA_OPTS
        value: "-Xms512m -Xmx512m"
    resources:
      limits:
        cpu: 2000m
        memory: 3072Mi
      requests:
        cpu: 200m
        memory: 3072Mi

This always ends with OOMKilled (error 137)

These jobs run fine in localhost (local db, local gRPC server, local gRPC client) with same memory that the test env (heap and container). The data are the same as I copied the test data into my localhost db.

docker run -d \
  --rm \
  --network host \
  --name synctest \
  -m 3072m \
  --cpus=2 \
  -e JAVA_OPTS="-Xms512m -Xmx512m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -XX:+FlightRecorder -XX:StartFlightRecording=filename=/tmp/rec.jfr" \
  -v ./.env:/home/jboss/.env \
  -v /tmp:/tmp \
  -v $HOME/.aws:/home/jboss/.aws \
  myecrregistry.dkr.ecr.eu-west-3.amazonaws.com/myproject/sync:build-68c049e8 \
  app-ref-sync

JFR memory report of the gRPC client:

The differences between local env (job OK) and the test env (job fails with OOMKilled) :

The DB might be slower than in localhost (in test env the db is on AWS RDS)
Network latency between my pods (db, server, and client are in different K8s namespaces, so might be on different AWS regions)
Stork is used to access gRPC server (multiple server pods on test/prod env)
- In localhost, the name resolver is "dns" and discovery type "static" (the server is running on localhost too, on a specific port)
- In test/prod env, the name resolver is "stork" and discovery type "kubernetes"
Jobs are run by ArgoWorkflow on test env

Other things I tought of:

A bit far-fetched but maybe something related to the gRPC back-pressure? Is it possible that the server pushes data too fast and the consumer stores it in off-heap memory until it can process them? When adding logs, I see that the server only takes a couple minutes to send all data in the output gRPC stream, but the client can take longer to process them. Note: we don't use Mutiny's Multi::onOverflow to buffer/drop entities. We use the default behavior.

Next steps of the analysis to do:

Run a job on K8s without ArgoWorkflow to check if Argo is not the source of the problem. We did it a few weeks ago but I need to do it again to be 100% as we changed some code and configuration since the last time. I run a Kubernetes job using k apply -f app-ref-sync-job.yml -n app-test-appB with the same memory and it was OOMKilled too (randomly, sometimes it runs fine, sometimes it's OOMKilled after a few seconds/minutes).
Create a reproducer (not so easy to create/run, as it only happens when jobs are running on a Kubernetes cluster)
Add logs, JFR reports, etc in the K8s env to understand what happens to the off-heap memory to find the source
Throttle SELECT and INSERT queries in my local env to try to get closer to the test environment performance and see if I can reproduce the issue on my machine

For those interested, here's how I migrated to Vert.x reactive client and kept jOOQ code generation (probably not the best way to do it but keeping jOOQ code generation simplified the migration from Quarkus-jooq/R2DBC to Vert.x reactive SQL client):

@ApplicationScoped
public class ProductService {

    @Inject
    protected DSLContext dslContext;

    @Inject
    protected SqlService sqlService;

    @Override
    public Multi<T> getAll() {
        return sqlService.select(
            dslContext.select(PRODUCT.PRODUCTS.ID, PRODUCT.PRODUCTS.NAME).from(PRODUCT.PRODUCTS),
            productMapper
        );
    }
}

import io.vertx.mutiny.pgclient.PgPool;
import org.jooq.Select;
// ...

@ApplicationScoped
public class SqlService {

    @Inject
    protected PgPool dbPool;

    public <T> Multi<T> select(final Select<?> query, final Mapper<T> mapper) {
        return dbPool.query(getSQL(query))
            .mapping(mapper::map)
            .execute()
            .onItem()
            .transformToMulti(RowSet::toMulti);
    }

    private static String getSQL(final Query query) {
        return query.getSQL(ParamType.INLINED);
    }
}

@ApplicationScoped
public class DSLContextProvider {

    @Produces
    public DSLContext create() {
        var jooqConf = new DefaultConfiguration()
            .set(SQLDialect.POSTGRES);
        return DSL.using(jooqConf);
    }
}

ennishol Nov 22, 2023

I have a similar problem with multiple applications, no db, no gRPC. Couldn't find the reason. I've increased memory limit until it didn't crash anymore...

cescoffier Dec 4, 2023
Maintainer

public <T> Multi<T> select(final Select<?> query, final Mapper<T> mapper) {
        return dbPool.query(getSQL(query))
            .mapping(mapper::map)
            .execute()
            .onItem()
            .transformToMulti(RowSet::toMulti);
    }

is loading everything in memory. Database do not stream. USe cursor if you need to read the data chunk by chunk. Remember that cursor will keep the connection opened (and so reduce your concurrency). Some databases, even need a transaction (like PG).

jdussouillez Dec 4, 2023

[...] is loading everything in memory. Database do not stream. USe cursor if you need to read the data chunk by chunk.

Yes indeed, I forgot to update this sample. See #37420 (comment)

jdussouillez · 2023-12-01T12:44:47Z

jdussouillez
Dec 1, 2023

New post following this response

I spent more time on this and I think I have something: it's a back-pressure problem: the server is much faster than the client. So it seems the client is storing all the "received but not processed yet" messages in the off-heap memory instead of the heap.

A weird thing about this is that the client memory goes up significantly when the server sent all its messages (like a "bump" in the client container used memory) instead of going up a little bit step by step.

Here's a repo to reproduce: https://github.com/jdussouillez/quarkus-high-off-heap-mem-usage

Run the server
Run the client
Watch the client container used memory using docker stats client and notice the off-heap memory bumping when the server has finished to send all its data

Here's some graphs to represent client side data:

Blue - products received (products emitted by server)
Green - products processed (products grouped into small chunks of 500 and inserted into the target database)

With a 512MB heap inside a 1GB container (using docker stats):

With a 1GB heap inside a 4GB container (using docker stats):

More details, theories and graphs in the repo: https://github.com/jdussouillez/quarkus-high-off-heap-mem-usage

3 replies

geoand Dec 1, 2023
Collaborator

Thanks for thorough analysis!

Maybe @alesj can have a look

jdussouillez Dec 4, 2023

Just found this: https://quarkus.io/guides/performance-measure#measuring-memory-correctly-on-docker

In order to measure memory correctly DO NOT use docker stat or anything derived from it (e.g. ctop)

I need to run my tests again to have more accurate values.

jdussouillez Dec 4, 2023

Ok so I ran a few tests using watch -n 1 docker top client -o pid,rss,args instead of docker stats and the results are the same. I still see the "bump" in RSS memory.

With a 512MB heap inside a 1GB container:

With a 1GB heap inside a 4GB container:

alesj · 2023-12-04T10:05:06Z

alesj
Dec 4, 2023
Collaborator

I guess you're still using the "old" gRPC support in Quarkus - from gRPC Java?

What if you tried using the new one, on both sides - server and client?
Or perhaps just client side. They should be compatible.
Hopefully the new Stork support - which we know is not the best (waiting for some Vert.x pieces to come together) - won't be a problem for that.

# this will enable new Vert.x based gRPC client usage
quarkus.grpc.clients.hello.use-quarkus-grpc-client=true

# this will enable new Vert.x based gRPC server usage -- different port (probably 8080 / 80), and no more 9000
quarkus.grpc.server.use-separate-server=false

4 replies

jdussouillez Dec 4, 2023

AFAIK I'm already using the new gRPC implementations (well, I think I am):

alesj Dec 4, 2023
Collaborator

Yeah, this looks like the new stuff.
Hmmm, in which version did we add this support, dunno if it was already there with 2.16.0.Final ...

So, what if you change this to old gRPC, any better?
e.g. at least the client, if we suspect the client-side being too slow

jdussouillez Dec 4, 2023

Using the old gRPC client (but still with the the new gRPC server), the memory bump is gone!

Hmmm, in which version did we add this support, dunno if it was already there with 2.16.0.Final ...

I'm not sure if it's related to this, I need to investigate on my side : when did we migrate to 2.16, when did we switch to the new gRPC implementations, and when did we notice memory issues. I just remember we couldn't move the new implementations until we got Stork support compatible with the new impl.

jdussouillez Dec 4, 2023

@alesj I seems to be related to server gzip compression. If I disable it then no more memory bumps. See #36691 (reply in thread)

franz1981 · 2023-12-04T10:46:40Z

franz1981
Dec 4, 2023

@jdussouillez Hi!

As saw in #36204 there could be many reasons for RSS footprint which can cause OOM on containers.

Few easy to try suggestions:

-XX:+UnlockExperimentalVMOptions -XX:TrimNativeHeapInterval=5000 if available: this can help in case the increased RSS footprint is due to JIT compilation arenas (or class unzipping/temporary native allocations and concurrent temporary allocations etc etc)
try setting -Dio.netty.maxDirectMemory=0 or enable MicroMeter metrics for Netty off-heap allocators (introduced by @alesj ): the former is causing Netty unpooled allocator to use Cleaner ByteBuffers which share the same off-heap memory limits (which is by default == the max heap capacity) of the JVM and enable such off-heap buffers to be observable via jfr/jmx tooling, while usually they are not - than check if there's any increase for such, now made visible to "normal" tools
you can use async-profiler with -e mprotect: glibc malloc, in case there are extensions of the malloc arenas (due to https://sourceware.org/bugzilla/show_bug.cgi?id=11261 or any other reasons), usually perform mprotect on the new allocated arena(s) - this should help detect how is doing it

In the mentioned issue there are mentioned other investigations which should performed to find the culprit of this.

16 replies

franz1981 Dec 4, 2023

@jdussouillez @vietj sending the PR: it seems we need to close the inputstream? let's see...

vietj Dec 4, 2023

right it seems to me our mistake here to forget closing the stream

vietj Dec 4, 2023

assuming incorrectly this was only using heap :-(

franz1981 Dec 4, 2023

eclipse-vertx/vertx-grpc#81 let's see: is it the right branch?

Answer selected by Sirz3chs

franz1981 Dec 4, 2023

@jdussouillez I still suggest to leave the option to autotrim: RSS increase can still happen due to concurrent malloc requests - fixing the leak is useful (obviously!), but even thou, with containers, It's better be safe and avoid useless off-heap allocations to cause OOM Killer to kick-in IMO

jdussouillez Dec 4, 2023

@franz1981 Thanks for the tip! I will enable this.

BTW, I haven't read all comments from #36204 but I think I saw that this could be better with JDK 21. Is TrimNativeHeapInterval option still relevant/useful if I migrate to JDK 21? Just wondering because I'm planning to upgrade in a few weeks.

franz1981 Dec 4, 2023

In relation of the JIT compilation behaviour "maybe" (it depends by JDK 21 relying more on the new Class API which is not that memory intensive for reflective accesses/lambda definitions) - but probably @geoand got some better answer related this, given that he played with CDS as well, which seems to work much better overall.

vietj · 2023-12-04T12:04:43Z

vietj
Dec 4, 2023

does this reproducer mandates to use docker to reproduce the issue ?

1 reply

jdussouillez Dec 4, 2023

@vietj No I guess not. It's just easier to notice the off-heap memory going wild in an isolated environment with limited resources. That's why I use a Docker container. Running the JAR file or even in quarkus:dev mode should have the same behavior, but harder to investigate.

For a long time we didn't see any issues because we were running our apps with the JAR file and Xms/Xmx options on our laptops with 32GB RAM so off-heap memory was not an issue. We saw the problem when running it on Kubernetes pods with little memory.

vietj · 2023-12-04T14:33:45Z

vietj
Dec 4, 2023

yes please, the other client might have set a "grpc-accept-encoding" that does not accept gzip preventing the server from using it

…

On Mon, Dec 4, 2023 at 3:32 PM Junior Dussouillez ***@***.***> wrote: do you know if message decompression was happening previously when using the grpc-netty client ? @vietj <https://github.com/vietj> Yes I think so (even if I'm not 100% sure the messages were compressed between both, maybe I could have a look with Wireshark or something like this). My server Quarkus configuration was the same when I ran some tests with the old gRPC client. See #36691 (reply in thread) <#36691 (reply in thread)> No memory bumps, and the app worked. I will run the old client again just to be sure, and run the async-profiler on it. — Reply to this email directly, view it on GitHub <#36691 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABXDCTHODVSJ77YCFJDAM3YHXNGPAVCNFSM6AAAAAA6PNNJ46VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TONJTG44DI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

2 replies

jdussouillez Dec 4, 2023

This is weird. When using the old netty gRPC client (but still the new Vert.x gRPC server), it seems the compression does not work.

the other client might have set a "grpc-accept-encoding" that does not
accept gzip preventing the server from using it

Well, it seems the old client add it.

But it seems the messages are not compressed, I can see plain text data in Wireshark:

And nothing about GZip in the async-profiler output:

--- Execution profile ---
Total samples       : 260
deoptimization      : 6 (2.31%)

--- 24 total (9.23%), 24 samples
  [ 0] mprotect

--- 21 total (8.08%), 21 samples
  [ 0] mprotect
  [ 1] jdk.internal.misc.Unsafe.allocateMemory0
  [ 2] jdk.internal.misc.Unsafe.allocateMemory
  [ 3] java.nio.DirectByteBuffer.<init>
  [ 4] java.nio.ByteBuffer.allocateDirect
  [ 5] io.netty.buffer.PoolArena$DirectArena.allocateDirect
  [ 6] io.netty.buffer.PoolArena$DirectArena.newChunk
  [ 7] io.netty.buffer.PoolArena.allocateNormal
  [ 8] io.netty.buffer.PoolArena.tcacheAllocateNormal
  [ 9] io.netty.buffer.PoolArena.allocate
  [10] io.netty.buffer.PoolArena.allocate
  [11] io.netty.buffer.PooledByteBufAllocator.newDirectBuffer
  [12] io.netty.buffer.AbstractByteBufAllocator.directBuffer
  [13] io.netty.buffer.AbstractByteBufAllocator.directBuffer
  [14] io.vertx.core.buffer.impl.PartialPooledByteBufAllocator.ioBuffer
  [15] io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate
  [16] io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read
  [17] io.netty.channel.nio.NioEventLoop.processSelectedKey
  [18] io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized
  [19] io.netty.channel.nio.NioEventLoop.processSelectedKeys
  [20] io.netty.channel.nio.NioEventLoop.run
  [21] io.netty.util.concurrent.SingleThreadEventExecutor$4.run
  [22] io.netty.util.internal.ThreadExecutorMap$2.run
  [23] io.netty.util.concurrent.FastThreadLocalRunnable.run
  [24] java.lang.Thread.run

[...]

But if I use the old Netty gRPC server with the old gRPC client, compression works:

https://en.wikipedia.org/wiki/Gzip

a 10-byte header, containing a magic number (1f 8b), the compression method (08 for DEFLATE), [...]

vietj Dec 5, 2023

have you look at headers sent by the client ?

vietj · 2023-12-04T14:36:32Z

vietj
Dec 4, 2023

yes but quarkus uses 4.x branch

…

On Mon, Dec 4, 2023 at 3:35 PM Francesco Nigro ***@***.***> wrote: eclipse-vertx/vertx-grpc#81 let's see: is it the right branch? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

1 reply

franz1981 Dec 4, 2023

yep, sent for both, in case

Suspected memory leak in off-heap memory #36691

Context

The jobs

Details

Here are some examples of the results we've had:

First a simple job who asks the server if he can start to sync (basically just returning a boolean)

Another job who syncs data, it has 2Gb of memory

We've also tested the nativemem experimental profile of async-profiler

Replies: 8 comments · 35 replies

quarkus-bot[bot] bot Oct 25, 2023

Sirz3chs Oct 30, 2023 Author

cescoffier Dec 4, 2023 Maintainer

geoand Dec 1, 2023 Collaborator

alesj Dec 4, 2023 Collaborator

alesj Dec 4, 2023 Collaborator

We've also tested the `nativemem` experimental profile of `async-profiler`

Replies: 8 comments 35 replies

quarkus-bot[bot]
bot Oct 25, 2023

Sirz3chs Oct 30, 2023
Author

cescoffier Dec 4, 2023
Maintainer

geoand Dec 1, 2023
Collaborator

alesj
Dec 4, 2023
Collaborator

alesj Dec 4, 2023
Collaborator