Lettuce metrics creates lots of long arrays, and gives out of memory error. #1210

omjego · 2020-01-20T09:24:21Z

Current Behavior

We are using Lettuce 5.0.4.RELEASE in our production environment for getting some data from Redis. And while scaling out we faced a very strange issue, that our app went out of memory under moderate traffic. I took the heap dump of the app and did the analysis. I figured out that the lettuce is allocating lots of AtomicLongArrays and those objects were taking almost 80% of the memory. When I tried to run GC forcefully using YourKit profiler, I was able to see GC Pause in the graphs but couldn't see memory getting freed. This happened 3 times and finally, when I forced GC for the 4th time the JVM released most of its memory.
After going through the GC roots of those Long arrays, I figured out that it has something to do with "Command Latency Metrics" in Lettuce. Those arrays belonged to the ConcurrentHashMap that was getting allocated by DefaultCommandLatencyCollector class.
I'm not sure what's happening, is this an expected behavior?

I'm attaching screenshots from YourKit.

Environment

Lettuce version(s): 5.0.4.RELEASE

mp911de · 2020-01-20T14:00:55Z

So basically, there are two issues:

Memory reclaimed only after 4th full GC.
Huge growth in LatencyStats objects along with their child objects.

We're using org.HdrHistogram and org.LatencyUtils to keep track of individual command latencies. Latency stats are keyed by source and target endpoints (IP and port) along with the command type (GET, SET, HMGET, …).

We have 194 commands. Given that there are ~ 72000 CommandLatencyIds, there could be the following causes:

Enabled local distinction in CommandLatencyCollectorOptions (disabled by default)
A huge number of Redis connections (over 300) that are or were active during the measurement window
A custom command type that implements ProtocolKeyword without implementing equals(…) and hashCode(). Looking at the Lettuce code, we have issues here when using dynamic Redis commands ("Redis Command Interfaces") with Redis modules or non-standard Redis commands.

Do you have a heap dump where you could inspect the various CommandLatencyId instances?

Given that the memory could be reclaimed after a couple of GC cycles, I'm not sure how to classify this issue. It could be that a metrics run released the latency objects and the fourth run actually was able to evacuate the objects from memory.

Can you check whether any of the mentioned points above applies to your code?

omjego · 2020-01-21T09:40:01Z

Hi @mp911de, thank you for the quick response. Yes, I did inspect the CommandLattencyId objects and most of them are related to the custom Redis module command that we use in our codebase. Although that command implements Commands interface. Please refer to the following code.

public interface CustomRedisCommand extends Commands
{
    @Command("ip2contry.getIp2country")
    String ipToCountry(String ip);
}

I checked the first two points too. We don't provide CommandLatencyCollectorOptions during the creation of ClientResources so it's disabled by default. And we're using connection pool with the max-active value set to 60, so connections won't exceed that number.

…1210 We now use the String representation of a command for equality and hashCode computation.

mp911de · 2020-01-21T15:43:22Z

We updated CommandLatencyId to consider the String representation of a command name. This change leads to the accumulation of metrics that share the same command name instead of unbounded growth of latency tracking objects.

A snapshot build is available that contains the fix. Care to give lettuce-core-5.2.2.BUILD-SNAPSHOT a spin to see whether it addresses your issue? See the readme for snapshot usage.

omjego added the type: bug A general bug label Jan 20, 2020

mp911de added this to the 5.2.2 milestone Jan 21, 2020

mp911de added a commit that referenced this issue Jan 21, 2020

Use String command representation in CommandLatencyId.equals/hashCode #…

df06111

…1210 We now use the String representation of a command for equality and hashCode computation.

mp911de added a commit that referenced this issue Jan 21, 2020

Use String command representation in CommandLatencyId.equals/hashCode #…

68e55fb

…1210 We now use the String representation of a command for equality and hashCode computation.

mp911de added a commit that referenced this issue Jan 21, 2020

Use String command representation in CommandLatencyId.equals/hashCode #…

0cf8389

…1210 We now use the String representation of a command for equality and hashCode computation.

mp911de closed this as completed Jan 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lettuce metrics creates lots of long arrays, and gives out of memory error. #1210

Lettuce metrics creates lots of long arrays, and gives out of memory error. #1210

omjego commented Jan 20, 2020 •

edited

Loading

mp911de commented Jan 20, 2020

omjego commented Jan 21, 2020 •

edited

Loading

mp911de commented Jan 21, 2020

Lettuce metrics creates lots of long arrays, and gives out of memory error. #1210

Lettuce metrics creates lots of long arrays, and gives out of memory error. #1210

Comments

omjego commented Jan 20, 2020 • edited Loading

Current Behavior

Environment

mp911de commented Jan 20, 2020

omjego commented Jan 21, 2020 • edited Loading

mp911de commented Jan 21, 2020

omjego commented Jan 20, 2020 •

edited

Loading

omjego commented Jan 21, 2020 •

edited

Loading