-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refreshing values with a RemovalListener on a BoundedLocalCache with direct executor causes values to be refreshed twice #872
Comments
Thank you for the unit test, identifying the problematic commit, and the explanation. You're right that this caused a double removal and conditionalizing the fallback passes your test. It seems then that this should be an easy fix. I had naively thought that since clearing the cache is not linearizable then performing that fallback logic unconditionally wouldn't do any harm as it would merely pick up new additions. I hadn't considered your case of those being reinserts for a removed key so that it did the deletion twice, which is certainly unexpected behavior. I think something like below might be good, where we track the overflow keys to remove one-by-one afterwards. I might tweak that some more to strengthen the attempt to avoid duplicates, e.g. due to map resizes during traversal. diff --git a/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java b/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java
index 708cac17..a9913a5d 100644
--- a/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java
+++ b/caffeine/src/main/java/com/github/benmanes/caffeine/cache/BoundedLocalCache.java
@@ -46,6 +46,7 @@ import java.lang.ref.WeakReference;
import java.time.Duration;
import java.util.AbstractCollection;
import java.util.AbstractSet;
+import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.Comparator;
@@ -53,6 +54,7 @@ import java.util.HashMap;
import java.util.IdentityHashMap;
import java.util.Iterator;
import java.util.LinkedHashMap;
+import java.util.List;
import java.util.Map;
import java.util.NoSuchElementException;
import java.util.Objects;
@@ -2033,6 +2035,7 @@ abstract class BoundedLocalCache<K, V> extends BLCHeader.DrainStatusRef
@Override
public void clear() {
+ List<K> fallbackKeys = List.of();
evictionLock.lock();
try {
// Discard all pending reads
@@ -2053,10 +2056,17 @@ abstract class BoundedLocalCache<K, V> extends BLCHeader.DrainStatusRef
// Discard all entries
int threshold = (WRITE_BUFFER_MAX / 2);
long now = expirationTicker().read();
+
for (var node : data.values()) {
- if (writeBuffer.size() >= threshold) {
+ var key = node.getKey();
+ if ((key != null) && (writeBuffer.size() >= threshold)) {
// Fallback to one-by-one to avoid excessive lock hold times
- break;
+ if (fallbackKeys.isEmpty()) {
+ fallbackKeys = new ArrayList<>(data.size());
+ }
+ fallbackKeys.add(key);
+ } else {
+
}
removeNode(node, now);
}
@@ -2065,7 +2075,7 @@ abstract class BoundedLocalCache<K, V> extends BLCHeader.DrainStatusRef
}
// Remove any stragglers, such as if released early to more aggressively flush incoming writes
- for (Object key : keySet()) {
+ for (K key : fallbackKeys) {
remove(key);
}
} |
When writing tests for this, it turns out that regardless of releases this can get stuck in an infinite loop. That requires using weak keys and a populated cache, which causes the import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.Timeout;
import com.github.benmanes.caffeine.cache.Caffeine;
import com.github.benmanes.caffeine.cache.LoadingCache;
import com.github.benmanes.caffeine.cache.RemovalCause;
import com.github.benmanes.caffeine.cache.testing.CacheContext;
public class CacheTest {
private final LoadingCache<Integer, Integer> cache = Caffeine.newBuilder()
.weakKeys()
.executor(Runnable::run)
.removalListener(this::listenRemoval)
.build(key -> 1);
private void listenRemoval(Integer key, Integer _value, RemovalCause cause) {
System.out.printf("Removing %s%n", key);
cache.refresh(key);
}
@Test @Timeout(5)
public void testCacheEvictionRefresh2() {
for (int i = 0; i < 1000; i++) {
cache.get(i);
}
cache.invalidateAll();
}
} |
A minimal reproducer. The hash codes have to vary enough (e.g. random) to trigger this case. public class CacheTest {
private final LoadingCache<Object, Object> cache = Caffeine.newBuilder()
.removalListener(this::listenRemoval)
.executor(Runnable::run)
.build(key -> key);
private void listenRemoval(Object key, Object value, RemovalCause cause) {
System.out.printf("Removing %s%n", key);
cache.refresh(key);
}
@Test
public void testCacheEvictionRefresh() {
for (int i = 0; i < 1000; i++) {
cache.get(ThreadLocalRandom.current().nextInt());
}
cache.invalidateAll();
}
} |
This case was effectively the same as if using a var map = new ConcurrentHashMap<Integer, Integer>();
for (int i = 0; i < 1000; i++) {
map.put(ThreadLocalRandom.current().nextInt(), 1);
}
for (var key : map.keySet()) {
map.remove(key);
map.put(key, key);
System.out.println(key);
} The solution is loop over a snapshot, e.g. |
Released in 3.1.5 |
Thank you for the super fast response and fix! 🎉 |
Thank you for the great library!
I have a setup where I use a cache to store the result of a network request that can possibly be very slow. Certain user interactions are blocking on having the value, so the goal is to avoid cache misses at all costs. To achieve this, I use a removal listener that checks whether the removed entry was evicted or explicitly removed, and if so calls
refresh
to reload the entry. The actual size of the cache is very small, so there are no memory concerns with preventing entries from being removed.I'm having an issue after a recent bump from
3.1.2
to3.1.3
, which I believe to have been introduced by this change: 68fbff8. After that change, theremoveNode
call causes values to be evicted and refreshed, followed by the subsequentremove
call which removes and refreshes values a second time.Here's an example test that works on version
3.1.2
but not on3.1.3
:Note that this is only an issue when using a direct executor or very fast async refresh, where the value is refreshed and re-added to the cache between
removeNode
andremove
.The text was updated successfully, but these errors were encountered: