Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoadingCache with expireAfterAccess(...) effectively ignores invalidateAll() if cache reload is in progress #251

Closed
jkarshin opened this issue Jun 11, 2018 · 4 comments

Comments

@jkarshin
Copy link

This may be related to #193, in that a cache load takes precedence over a cache invalidate operation in some situations.

Consider the following scenario involving a LoadingCache:

  1. LoadingCache is initially empty
  2. Thread-1 performs a cache read for some key
  3. LoadingCache invokes lambda to load the value for the key (on Thread-1)
  4. While the lambda is running, Thread-2 updates the underlying value for the key (perhaps in a DB)
  5. While the lambda is running, Thread-2 calls invalidateAll() on the cache
  6. After the call to invalidateAll() completes, Thread-2 performs a cache read for the same key in step 2.

For the initial read (on Thread-1), I would expect the result to be indeterminate; since there's a race between the lambda's execution and the write performed by Thread-2. However, I think it is reasonable to expect that after the call to invalidateAll(), the cache read performed by Thread-2 should result in a cache re-load, which would read the newly written value.

For a LoadingCache without expireAfterAccess(...)/expireAfterWrite(...), this appears to be the case. The invalidateAll() call is blocked until the initial cache load completes, and the second cache read causes a reload.

However, if expireAfterAccess(...)/expireAfterWrite(...) is called when creating the LoadingCache, invalidateAll() is NOT blocked. The cache read on Thread-2 is still blocked by the cache load on Thread-1 (since they are accessing the same key), however, the cache is never actually invalidated. Subsequent cache reads yield the stale value from the initial load.

Code to reproduce (Java version: 1.8.0_162, Caffeine version: 2.6.2):

    private static final String KEY = "KEY";

    @Test
    void testConcurrentReadWrite() throws Exception {
        // Use an atomic int to simulate a DB
        AtomicInteger dbValue = new AtomicInteger(0);

        // Create a cache that will retrieve a value from the DB and sleep
        LoadingCache<String, Integer> cache = Caffeine.newBuilder()
                .expireAfterAccess(Duration.ofMinutes(9001))
                .build(key -> {
                    int toReturn = dbValue.get();
                    log("Performing cache reload, db value: " + toReturn);
                    sleep(1000);
                    log("Returning from cache reload...");
                    return toReturn;
                });

        /*
         * Thread #1 will perform a cache read. Since the cache is empty, this
         * should trigger a cache load which will retrieve '0'.
         */
        Thread t1 = new Thread(() -> {
            log("Performing cache read...");
            int result = cache.get(KEY);
            log("Cache read result: " + result);
        });

        /*
         * Thread #2 will perform a write while Thread #1 is reloading the cache.
         * Thread #1's read will be unaffected by this write.
         * After the write completes, we will perform a cache read.
         */
        Thread t2 = new Thread(() -> {
            // Sleep to ensure the cache reload is in progress when we call invalidate
            sleep(500);

            // "Write" to the db and invalidate the cache.
            dbValue.set(1);
            log("Wrote to DB. Invalidating cache...");
            cache.invalidateAll();
            log("Finished invalidating cache. Performing cache read...");
            int result = cache.get(KEY);
            log("Cache read result: " + result);
        });

        // Launch threads and wait
        t1.start();
        t2.start();
        t1.join();
        t2.join();
    }

    // **********************************
    // *  Helpers
    // **********************************

    private void log(final String s) {
        System.out.println(LocalTime.now() + ", " + Thread.currentThread()
                .getName() + ": " + s);
    }

    private void sleep(final long millis) {
        try {
            Thread.sleep(millis);
        }
        catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
    }

Executing the above test yields the following output:

15:57:17.479, Thread-1: Performing cache read...
15:57:17.482, Thread-1: Performing cache reload, db value: 0
15:57:17.972, Thread-2: Wrote to DB. Invalidating cache...
15:57:17.973, Thread-2: Finished invalidating cache. Performing cache read...
15:57:18.483, Thread-1: Returning from cache reload...
15:57:18.486, Thread-1: Cache read result: 0
15:57:18.488, Thread-2: Cache read result: 0

Notice how the cache invalidation completes immediately and the second cache read returns the stale cached result.

Executing the same test with the expireAfterAccess(...) removed yields the following output:

15:58:59.706, Thread-1: Performing cache read...
15:58:59.711, Thread-1: Performing cache reload, db value: 0
15:59:00.197, Thread-2: Wrote to DB. Invalidating cache...
15:59:00.714, Thread-1: Returning from cache reload...
15:59:00.714, Thread-1: Cache read result: 0
15:59:00.714, Thread-2: Finished invalidating cache. Performing cache read...
15:59:00.714, Thread-2: Performing cache reload, db value: 1
15:59:01.715, Thread-2: Returning from cache reload...
15:59:01.715, Thread-2: Cache read result: 1

Note that this time, the invalidateAll() operation is blocked until the initial cache load completes. Additionally, the second cache read results in a cache reload, which returns the updated value.

In fairness, the javadocs for invalidateAll(...) do state that "The behavior of this operation is undefined for an entry that is being loaded and is otherwise not present." That said, it feels extreme that calling invalidateAll(...) will be ignored without any indication, if a load happens to be in-progress.

Additionally for what it's worth, if the invalidateAll(...) call is replaced with invalidate(KEY), the invalidation is blocked and the cache is invalidated as I would expect (even if .expireAfterAccess(...) is present).

Just wondering if this is all expected, and if so, is there a way to do an invalidateAll() operation and be guaranteed that the next cache read will result in a cache re-load?

Thanks in advance!

@ben-manes
Copy link
Owner

Unfortunately this is a limitation imposed by the underlying ConcurrentHashMap which does not expose a key iterator that includes in-flight operations. For more advanced configurations an invalidateAll requires iterating over the cache, removing an item by key, and sending the value to a listener (RemovalListener and CacheWriter).

If no bounds are configured, then UnboundedLocalCache is used as a light-weight adapter to ConcurrentHashMap. It will use Map#clear() when no listeners are involved. That method knows of these in-flight loads, so it blocks as desired.

When a bound is configured, then BoundedLocalCache is a much more complex decorator. It uses an iterator, in this case the values() iterator, for removal. All of the map iterators hide the pending load. We cannot use clear() since it would not let us do the necessary processing work.

The only solution to this would be if we forked ConcurrentHashMap to peek at its internals. There are justifiable reasons, such as better memory packing, but it also incurs a high maintenance cost. For now I've preferred not doing that, especially since Value Types could dramatically change the internals. Unfortunately without an obvious mechanism I don't think we can fix this.

I haven't looked at the internal iterators recently, so I don't recall if they block or not (removeIf). Most likely they don't by using the iterator, rather than the blind sweep of the underlying table[]

@jkarshin
Copy link
Author

That is unfortunate. Thanks for the quick response.

@ben-manes
Copy link
Owner

ben-manes commented Jun 11, 2018

Sorry for the bad news. The only other option that I've thought of is to maintain an in-flight key set, e.g. ConcurrentMap.newKeySet(). If we inserted / removed during the computation, then it would allow us inspect that keyset during the invalidateAll() iteration. Obviously that still leaves races, so it is not perfect.

@ben-manes
Copy link
Owner

I found the thread where I discussed this with Doug Lea.

On 12/20/2015 02:09 AM, Benjamin Manes wrote:
Oh, one issue that I don't have any good suggestions for resolving is invalidation of suppressed keys due to computations #13. The request is that a clear operations (aka invalidateAll) should block until all in-flight loads entries are complete and subsequently removed. While this is done by CHM#clear(), it cannot be done by a decorator which needs to process the entries (listeners, policy, etc). All iterations of the map suppress entries until they have fully materialized.

The desire to do this is racy, even using clear, which non-atomically traverses. So there's only weak consistency wrt arriving/ongoing/completed computeIfAbsent's etc. In principle, CHM could offer a method with similar properties, but it still wouldn't provide a simple-to-state guarantee.

The only solution I know is for users themselves to arrange quiescent points if they need a strict demarcation. (And I wonder if this desire about invalidateAll obscures assumptions they might have about consistency.)

-Doug

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants