Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoadingCache::get returns value after invoking RemovalListener #177

Closed
jandam opened this issue Aug 15, 2017 · 14 comments
Closed

LoadingCache::get returns value after invoking RemovalListener #177

jandam opened this issue Aug 15, 2017 · 14 comments

Comments

@jandam
Copy link

jandam commented Aug 15, 2017

LoadingCache::get returns identical instance on which RemovalListener was already called. I would expect that value is invalidated/evicted before calling RemovalListener::onRemove.

  private static final int PAGE_COUNT = 1 << 10; 
 Caffeine<Long, Page> builder = (Caffeine) Caffeine.newBuilder()
                .maximumSize(PAGE_COUNT)
                .executor(Runnable::run);
                .removalListener(removalListener);
        cache = builder.build(loader);

I'm experiencing issue in following sequence. Tn - thread number 'n'

  • BEGIN
    T1: cache.get(key1)
  • get value1 for key1 started (value1 is present in cache)
    T2: cache.get(key2)
  • value for key2 is missing - so it is automatically loaded
    T2: loader.load(key2)
  • triggers eviction for key1, cause=SIZE
    T2: return cache.get(key2)
  • return with newly created value for key2
    T1: removalListener.onRemoval(key1, value1, cause=SIZE)
  • notification that value1 for key1 was removed
  • another issue:
    I think that xValue = cache.getIfPresent(key1) must be xValue==null || xValue != value1. But some times I get xValue == value1 - it means that value1 is still present in cache.
    T1: return cache.get(key1)
  • ERROR
  • returns same instance of value1 for key1 on which was called onRemove
  • I would expect that value1 for key1 is protected from eviction or loader.load(key1) is called.

--
Thank you for advice/analysis

@ben-manes
Copy link
Owner

Sorry, I don't understand the problem. Can you provide a test case?

@jandam
Copy link
Author

jandam commented Aug 15, 2017

Problem:
LoadingCache.get returns value on which RemovalListener.onRemoval was already called.
I want to use onRemoval for recycling resources. So I "release" the resource. But LoadingCache.get returns already "released" resource. I would expect to return new "loaded" instance.

Tomorrow I will try to send test case.

@ben-manes
Copy link
Owner

You may be observing that the removal listener is asynchronous from the operation. It will not be called atomically within the map operation that deleted that mapping and should happen sometime afterwards.

* Notifies the listener that a removal occurred at some point in the past.
* <p>
* This does not always signify that the key is now absent from the cache, as it may have already
* been re-added.

That's nicer in the common case where you don't want to delay other writes or the calling thread, but only when the work is fairly decoupled. If you need to run code synchronously with the removal then you can try CacheWriter instead. The writer will be invoked within the atomic computation, so a failure bubbles up to the caller.

I'm not sure if the rest is similar or not. Since the operations delegate to ConcurrentHashMap they should be atomic.

@ben-manes
Copy link
Owner

hmm, I am not sure then. The internal node might be recycled for expiration or weak/soft values, but that isn't the case here. I agree it shouldn't be able to give you a stale read after the mapping is gone, though a reader could obtain the value just prior to that.

@Maaartinus
Copy link

I guess, I understand the OP. They problem is not that the notification happens later, but rather that after the notification containing value1, the same value1 is still present. The test case in pseudocode could be like

cache.put(key1, value1);
cache.addListener(new Listener() {
    onRemoval(K key1, V value1, Cause cause) {
        assertTrue(value1 != cache.getIfPresent(key1));
    }
}
causeEviction();

No idea if this really happens, but I'd bet, I'm interpreting it right as he speaks about recycling resources.

@ben-manes
Copy link
Owner

I just checked all calls to BoundedLocalCache#notifyRemoval and it is invoked after the computation completes. If @jandam is using a version prior to 2.4.0, there was a stale notification bug causing the listener to observe the wrong value. A similar problem could exist, so I'm not discounting having bugs and flawed tests. If he's able to reproduce it in a unit test then hopefully I can work it out from there and resolve it.

@jandam
Copy link
Author

jandam commented Aug 15, 2017 via email

@jandam
Copy link
Author

jandam commented Aug 16, 2017

Tested on 2.5.2 and 2.5.4

@jandam
Copy link
Author

jandam commented Aug 16, 2017

Attached test with CacheLoader. Same issue with cache.get(, )
caffeine-issue-177.zip

@ben-manes
Copy link
Owner

In your test, I think there is an inherent race that can't be solved at the cache. The live value can be read from the cache by T1, the entry evicted and released by T2, and still in use by T1. A context switch makes that entirely possible. Imagine if you reduced the cache size to zero, you would expect that to happen too. Changing your code to that and it fails without threads.

The cache size and eviction policy make all the difference in your test, since entries are loaded and not reused. If the cache size is large enough then the race won't occur because eviction is delayed just long enough to not crash. Switching to Guava works at your cache size but fails at smaller. That's due to using LRU so the accessed entry is delayed by O(n) operations. Caffeine may evict much sooner to avoid cache pollution, which increases the hit rate by favoring frequency over recency.

In your case you need to provide additional synchronization around the value to handle races on mutable state. A simple approach is to synchronize on the page during usage, validating it before use under the lock. If that is too slow, you could use reference counting where the page has the count of in-cache + readers, the removal listener decrements, and the last reader performs the final clean-up when reaching zero. You might consider phantom references for this to let the GC do that implicitly so that releasing is delayed until the value is unreachable, e.g. a proxy. Then use the referent to lookup the page and release it. There are other techniques like optimistic reads with validation afterwards (e.g. StampledLock). Regardless, you'll need to manage these types of races.

@jandam
Copy link
Author

jandam commented Aug 16, 2017 via email

@ben-manes
Copy link
Owner

No problem, happy to be of help.

@ben-manes
Copy link
Owner

@jandam I began a FAQ, so please let me know as you encounter gotchas we should add.

@jandam
Copy link
Author

jandam commented Aug 22, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants