Memory leak with a simple Cache using only weak keys and weak values #167

lauredogit · 2017-06-26T15:20:02Z

Hi,

When using a simple Cache built only specifying .weakKeys() and .weakValues() where keys and values are maintained by other data structures using strong refereneces and maintaining their own lifecycle, we observed a memory leak in production with Caffeine whilst not having observed it with Guava.

With this configuration, we are using the BoundedLocalCache with a LocalCacheFactory.WI and are creating NodeFactory.WW Nodes.

In the WW Node, there is a WeakValueReference which stores a strong reference to the key.

This strong reference caused a memory leak occupying 2/3 of the heap.

We did not setup a Cache#cleanUp task as we expected our cache to be sufficiently accessed to perform its maintenance tasks without it.

The Cache gets about 11 writes per second in production of mainly unique keys, i.e we are using it as a Store, not as a Cache.

I guess our use case does not really fit in and that we need to schedule the cleanUp of the Cache every now and then.

If you need additional details, feel free to ask.

Best regards,
Dominique

ben-manes · 2017-06-26T16:21:17Z

Thanks for the report and sorry for the inconvenience.

As you noted, NodeFactory.WW holds only weak references:

static class WW<K, V> implements Node<K, V> {
  volatile WeakKeyReference<K> key;
  volatile WeakValueReference<V> value;
  ...

The map should be referenced using node.getKeyReference() to store based on the weak reference and not the strong reference. The lookups should use Reference.LookupKeyReference to match.

ReferenceTest should be asserting this behavior. It seems to mostly have value-based tests and not enough weak key tests, unfortunately. When I quickly hack one of the tests to be based on keys it passes,

@Test(dataProvider = "caches")
@CacheSpec(keys = ReferenceType.WEAK, values = {ReferenceType.STRONG, ReferenceType.WEAK},
    expireAfterAccess = Expire.DISABLED, expireAfterWrite = Expire.DISABLED,
    maximumSize = Maximum.DISABLED, weigher = CacheWeigher.DEFAULT,
    population = Population.FULL, stats = Stats.ENABLED, removalListener = Listener.CONSUMING)
public void put_weak(Cache<Integer, Integer> cache, CacheContext context) {
  cache.put(10000000, context.absentValue());
  context.clear();
  GcFinalization.awaitFullGc();

  long count = context.initialSize() - cache.estimatedSize() + 1;
  if (context.population() != Population.SINGLETON) {
    assertThat(count, is(0L));
  }
  assertThat(cache, hasRemovalNotifications(context, count, RemovalCause.COLLECTED));
  verifyWriter(context, (verifier, writer) -> verifier.deletions(count, RemovalCause.COLLECTED));
}

Can you provide a failing test or direct me to where the strong reference is being held? I agree that at your write rate an explicit cleanUp shouldn't be necessary. Every write should trigger a cleanUp and empty WeakReference objects should be cheap even if held a little too long.

lauredogit · 2017-06-26T16:43:09Z

Hi Ben,

The strong reference is held by com.github.benmanes.caffeine.cache.References$WeakValueReference

/**
   * The value in a cache that holds values weakly. This class retains a reference to the key in
   * the advent that the value is reclaimed so that the entry can be removed from the cache in
   * constant time.
   */
  static final class WeakValueReference<V> extends WeakReference<V>
      implements InternalReference<V> {
    private final Object keyReference;

    public WeakValueReference(@Nonnull Object keyReference,
        @Nullable V value, @Nullable ReferenceQueue<V> queue) {
      super(value, queue);
      this.keyReference = keyReference;
    }

    @Override
    public Object getKeyReference() {
      return keyReference;
    }

    @Override
    public boolean equals(Object object) {
      return referenceEquals(object);
    }

    @Override
    @SuppressWarnings("PMD.UselessOverridingMethod")
    public int hashCode() {
      return super.hashCode();
    }
  }

The keyReference field is a strong reference and prevents the GC from reclaiming the keys, which created our leak.

If I understood correctly, you need the strong reference for the com.github.benmanes.caffeine.cache.BoundedLocalCache#drainValueReferences method, unfortunately if the cache does not call this method sufficiently, the memory leak happens.

ben-manes · 2017-06-26T16:47:47Z

Oh, thanks!

I think this was a mistake in the generated code. The keyReference is an object so that it could be a WeakKeyReference. The generated code is mistakenly,

public final void setValue(V value, ReferenceQueue<V> referenceQueue) {
  ((Reference<V>) getValueReference()).clear();
  UnsafeAccess.UNSAFE.putObject(this, VALUE_OFFSET, new WeakValueReference<V>(key, value, referenceQueue));
}

I think if we changed this to call getKeyReference() instead of key then it would work.

ben-manes · 2017-06-26T16:56:09Z

I should clarify that keyReference is either key if a strongly held or (should be) WeakKeyReference if weakly held. I think changing the code generator to account for this would resolve the problem. I'll also add unit tests to better cover this case.

Its morning in PST and I won't be able to get to this until the evening. I'll try to get it fixed and released tonight, if that's okay.

lauredogit · 2017-06-26T16:56:45Z

Well, it seems so.

In the com.github.benmanes.caffeine.cache.BoundedLocalCache#put(K, V, boolean, boolean),

You are creating new Nodes with

          node = nodeFactory.newNode(key, keyReferenceQueue(),
              value, valueReferenceQueue(), newWeight, now);

and with the WW NodeFactory it does:

  WW {
    <K, V> Node<K, V> newNode(K key, ReferenceQueue<K> keyReferenceQueue, V value,
        ReferenceQueue<V> valueReferenceQueue, int weight, long now) {
      return new WW<>(key, keyReferenceQueue, value, valueReferenceQueue, weight, now);
    }

    <K, V> Node<K, V> newNode(Object keyReference, V value, ReferenceQueue<V> valueReferenceQueue,
        int weight, long now) {
      return new WW<>(keyReference, value, valueReferenceQueue, weight, now);
    }

    <K> Object newLookupKey(K key) {
      return new LookupKeyReference<K>(key);
    }

    <K> Object newReferenceKey(K key, ReferenceQueue<K> referenceQueue) {
      return new WeakKeyReference<K>(key, referenceQueue);
    }
  },

which creates a WW Node:

  static class WW<K, V> implements Node<K, V> {
    protected static final long KEY_OFFSET = UnsafeAccess.objectFieldOffset(WW.class, "key");

    protected static final long VALUE_OFFSET = UnsafeAccess.objectFieldOffset(WW.class, "value");

    volatile WeakKeyReference<K> key;

    volatile WeakValueReference<V> value;

    WW(K key, ReferenceQueue<K> keyReferenceQueue, V value, ReferenceQueue<V> valueReferenceQueue,
        int weight, long now) {
      UnsafeAccess.UNSAFE.putObject(this, KEY_OFFSET, new WeakKeyReference<K>(key, keyReferenceQueue));
      UnsafeAccess.UNSAFE.putObject(this, VALUE_OFFSET, new WeakValueReference<V>(key, value, valueReferenceQueue));
    }

And in the WW Node, the key is currently not being wrapped into a WeakReference in

new WeakValueReference(key, value, valueReferenceQueue)) .

Regarding the issue, thank you very much for your prompt reaction.

But no hurry, you can take your time, we are currently applying a workaround in production.

And congratulations for Caffeine, by the way, it's really nice!

ben-manes · 2017-06-27T04:29:59Z

Thanks. I haven't had a chance to work on this tonight. I'll have it out by Monday if this week remains hectic, but I'll try to get it resolved sooner. I think most of the work here is to the missing write test cases to assert the failure and the fix.

ben-manes · 2017-06-27T23:58:07Z

Reminder to self - This is also a bug for weakKeys+softValues, unsurprisingly since the generated code is parameterized.

ben-manes · 2017-06-28T06:09:32Z

I have an improved unit test that fails with master and passes with the fix (branch). I need to review all the test cases in ReferenceTest to ensure they are generalized to capture this scenario.

The constructors are now generated as,

WW(K key, ReferenceQueue<K> keyReferenceQueue, V value, ReferenceQueue<V> valueReferenceQueue,
    int weight, long now) {
  this(new WeakKeyReference<K>(key, keyReferenceQueue), value, valueReferenceQueue, weight, now);
}

WW(Object keyReference, V value, ReferenceQueue<V> valueReferenceQueue, int weight, long now) {
  UnsafeAccess.UNSAFE.putObject(this, KEY_OFFSET, keyReference);
  UnsafeAccess.UNSAFE.putObject(this, VALUE_OFFSET, new WeakValueReference<V>(keyReference, value, valueReferenceQueue));
}

lauredogit · 2017-06-30T12:31:22Z

Hi Ben,

Great news, this is indeed the needed fix for the WW Node!

As you said , you should also apply the same fix to the WSo Node using weak keys and soft values.

  static class WSo<K, V> implements Node<K, V> {
    protected static final long KEY_OFFSET = UnsafeAccess.objectFieldOffset(WSo.class, "key");

    protected static final long VALUE_OFFSET = UnsafeAccess.objectFieldOffset(WSo.class, "value");

    volatile WeakKeyReference<K> key;

    volatile SoftValueReference<V> value;

    WSo(K key, ReferenceQueue<K> keyReferenceQueue, V value, ReferenceQueue<V> valueReferenceQueue,
        int weight, long now) {
      UnsafeAccess.UNSAFE.putObject(this, KEY_OFFSET, new WeakKeyReference<K>(key, keyReferenceQueue));
      UnsafeAccess.UNSAFE.putObject(this, VALUE_OFFSET, new SoftValueReference<V>(key, value, valueReferenceQueue));
    }

    WSo(Object keyReference, V value, ReferenceQueue<V> valueReferenceQueue, int weight, long now) {
      UnsafeAccess.UNSAFE.putObject(this, KEY_OFFSET, keyReference);
      UnsafeAccess.UNSAFE.putObject(this, VALUE_OFFSET, new SoftValueReference<V>(keyReference, value, valueReferenceQueue));
    }

=>

    WSo(K key, ReferenceQueue<K> keyReferenceQueue, V value, ReferenceQueue<V> valueReferenceQueue,
        int weight, long now) {
this(new WeakKeyReference<K>(key, keyReferenceQueue), value, valueReferenceQueue, weight, now);
    }

ben-manes · 2017-06-30T16:18:38Z

The code generator change also fixed WSo to match your suggestion. The new unit tests should cover that case due to being parameterized to run against all configurations that match the specification. Of course that only helps when the tests are written or not flawed, which caused this case to go unnoticed. (It also results in millions of test executions, which now exceeds the maximum build time in Travis)

Sorry for the delay - its been an exhausting week. I'll have this released by Sunday night (PST) so that it is available Monday. Thanks for being patient.

ben-manes · 2017-07-03T06:49:25Z

I finished the tests (see branch) but there are two tooling issues that I need to address. Sorry that this will take another few days to work out. Specifically I need to split the Travis build to pass the time limit and the latest ErrorProne causes a compilation to fail.

/Users/ben/projects/caffeine/caffeine/src/test/java/com/github/benmanes/caffeine/cache/MultiThreadedTest.java:100: error: An unhandled exception was thrown by the Error Prone static analysis plugin.
      (cache, key) -> cache.get(key),
                               ^
     Please report this at https://github.com/google/error-prone/issues/new and include the following:

     error-prone version: 2.0.21
     Stack Trace:
     java.lang.NullPointerException
        at com.google.errorprone.matchers.IsLastStatementInBlock.matches(IsLastStatementInBlock.java:31)

ben-manes · 2017-07-03T06:51:31Z

Seems errorprone was fixed in google/error-prone@b9b492f

ben-manes · 2017-07-05T00:44:03Z

Sorry, I spent the holiday fighting with TravisCI and haven't gotten it to be stable yet. Despite splitting the build into multiple jobs, it still exceeds the timeout. That's probably because they are run on the same virtual machine, so the parallelism doesn't get any additional resources. I might need to switch to another CI that offers a longer build time.

ben-manes · 2017-07-05T07:45:42Z

I think I was able to get Travis under control by reducing the number of concurrent jobs to 1. Then the build is split into 3 jobs, each of which can run for up to 50 minutes. The only flaw is that code coverage can only be obtained for one of the jobs and the largest is near the limit.

If I am able to get a pass on Travis then I'll release sometime tomorrow.

ben-manes · 2017-07-06T04:52:08Z

TravisCI support agreed to increase the timeout if their idea didn't work, which sadly it didn't. So I'm waiting to hear back when the change is applied, can fully validate the build, and release.

Sorry for the long delay and noisy responses. Just want to keep you informed that I am actively trying to get this released and not ignoring the problem.

ben-manes · 2017-07-06T05:08:38Z

@davsclaus - LruCache
@benson-git - ChannelPool
@DavyLandman - RascalExecutionContext

According to a github search, your projects may be affected by this bug. I have not evaluated when it was introduced, so it may have always gone unnoticed due to flawed tests. This bug causes caches with weakKeys + weak/softValues to not be collected, due to a strong reference and failed lookup. As you have not reported this issue hopefully it is a minor problem in your use cases, but please do upgrade promptly. Sorry for the inconvenience.

A release is pending TravisCI support to enable additional compute time so that I can fully validate the build (suite passes locally). I will update this bug when the release is live.

phraktle · 2017-07-06T06:49:13Z

Just an off-topic remark: wow @ben-manes :) Beyond building the best caching library, your level of attentiveness is exemplary. Thanks for your efforts!

ben-manes · 2017-07-06T08:19:34Z

Thanks, but my view is the opposite. It's an attempt to show respect and gratitude for the patience / understanding for any frustrations that my mistakes induced. I can't earn that generosity, but I can try to show that it is not taken for granted.

davsclaus · 2017-07-06T09:46:29Z

Hi I can only second what Viktor says. Awesome help and support from Ben. We are happy to use Caffeine in Apache Camel in both the camel-core, and now in the upcoming camel-caffeine component as well: https://github.com/apache/camel/tree/master/components/camel-caffeine

…

On Thu, Jul 6, 2017 at 8:49 AM, Viktor Szathmáry ***@***.***> wrote: Just an off-topic remark: wow @ben-manes <https://github.com/ben-manes> :) Beyond building the best caching library, your level of attentiveness is exemplary. Thanks for your efforts! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#167 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAdHrJ7Ua45QoEAfQMJ8i-U7lW7sp7jYks5sLINwgaJpZM4OFdj4> .

-- Claus Ibsen ----------------- http://davsclaus.com @davsclaus Camel in Action 2: https://www.manning.com/ibsen2

DavyLandman · 2017-07-06T10:02:32Z

I have to say, if all library authors were as proactive as @ben-manes, we would be a lot less weary of new dependencies. So thanks for the report, updating today or tomorrow (after update release).

ben-manes · 2017-07-09T04:41:32Z

Released 2.5.3. Please remember to upgrade.

Thanks for the kind words.

see ben-manes/caffeine#167

yasarhike · 2024-05-30T05:18:43Z

Hey ben, when i set the maximumSize property of the cache the behaviour is unpredictable. Can you please the behaviour ?

ben-manes · 2024-05-30T06:06:23Z

Please open a new issue @yasarhike. The size eviction does include a bit of randomization to protect against denial of service attacks as predictable behavior is exploitable.

see ben-manes/caffeine#167

ben-manes closed this as completed in 05e7f64 Jul 5, 2017

DavyLandman added a commit to usethesource/vallang that referenced this issue Jul 9, 2017

Bumped version of caffeine as per ben-manes/caffeine#167

2ecc14e

DavyLandman added a commit to usethesource/rascal that referenced this issue Jul 9, 2017

Dependency bump of vallang atfer updating caffeine dependency

30be5e2

see ben-manes/caffeine#167

jurgenvinju pushed a commit to usethesource/java-air that referenced this issue Mar 4, 2025

Dependency bump of vallang atfer updating caffeine dependency

74e1c24

see ben-manes/caffeine#167

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak with a simple Cache using only weak keys and weak values #167

Memory leak with a simple Cache using only weak keys and weak values #167

lauredogit commented Jun 26, 2017

ben-manes commented Jun 26, 2017

lauredogit commented Jun 26, 2017 •

edited

Loading

ben-manes commented Jun 26, 2017

ben-manes commented Jun 26, 2017

lauredogit commented Jun 26, 2017 •

edited

Loading

ben-manes commented Jun 27, 2017

ben-manes commented Jun 27, 2017

ben-manes commented Jun 28, 2017

lauredogit commented Jun 30, 2017

ben-manes commented Jun 30, 2017

ben-manes commented Jul 3, 2017

ben-manes commented Jul 3, 2017

ben-manes commented Jul 5, 2017

ben-manes commented Jul 5, 2017

ben-manes commented Jul 6, 2017

ben-manes commented Jul 6, 2017

phraktle commented Jul 6, 2017

ben-manes commented Jul 6, 2017

davsclaus commented Jul 6, 2017 via email

DavyLandman commented Jul 6, 2017 •

edited

Loading

ben-manes commented Jul 9, 2017

yasarhike commented May 30, 2024 •

edited

Loading

ben-manes commented May 30, 2024

Memory leak with a simple Cache using only weak keys and weak values #167

Memory leak with a simple Cache using only weak keys and weak values #167

Comments

lauredogit commented Jun 26, 2017

ben-manes commented Jun 26, 2017

lauredogit commented Jun 26, 2017 • edited Loading

ben-manes commented Jun 26, 2017

ben-manes commented Jun 26, 2017

lauredogit commented Jun 26, 2017 • edited Loading

ben-manes commented Jun 27, 2017

ben-manes commented Jun 27, 2017

ben-manes commented Jun 28, 2017

lauredogit commented Jun 30, 2017

ben-manes commented Jun 30, 2017

ben-manes commented Jul 3, 2017

ben-manes commented Jul 3, 2017

ben-manes commented Jul 5, 2017

ben-manes commented Jul 5, 2017

ben-manes commented Jul 6, 2017

ben-manes commented Jul 6, 2017

phraktle commented Jul 6, 2017

ben-manes commented Jul 6, 2017

davsclaus commented Jul 6, 2017 via email

DavyLandman commented Jul 6, 2017 • edited Loading

ben-manes commented Jul 9, 2017

yasarhike commented May 30, 2024 • edited Loading

ben-manes commented May 30, 2024

lauredogit commented Jun 26, 2017 •

edited

Loading

lauredogit commented Jun 26, 2017 •

edited

Loading

DavyLandman commented Jul 6, 2017 •

edited

Loading

yasarhike commented May 30, 2024 •

edited

Loading