Unexpected expireAfterWrite behavior when computation of entry duration > expireAfterWrite value #191

perlambaek · 2017-10-10T13:47:41Z

We are experiencing some behavior we cannot really make sense of. When running the following test, we expect:

The first 2 seconds of running to produce cache entries instantly. (Untill expireAfterWrite is reached)
Thereafter we expect each of the 50 threads to take 3 seconds to get a new entry, (since recomputing takes 3 seconds now)
Then for the next 2 seconds, instantly get their cache entries. (Since we just wrote a new entry that is viable for expireAfterWrite duration)
Rinse and repeat.

Instead we see non deterministic behavior with threads waiting from 3 seconds to 12 with no real pattern. The output from the test will show the timings becoming off fairly fast.

import com.github.benmanes.caffeine.cache.Caffeine;
import com.github.benmanes.caffeine.cache.LoadingCache;
import com.google.common.base.Stopwatch;
import org.junit.Test;

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class CacheTest {
    int count = 0;

    @Test
    public void caffeine() throws Exception {
        ExecutorService executor = Executors.newCachedThreadPool();

        com.github.benmanes.caffeine.cache.CacheLoader<String, String> loadingFunc = (String key) -> {
            count++;
            if (count > 1) {
                Thread.sleep(3000);
            }
            return "foo";
        };


        LoadingCache<String, String> testCache = Caffeine.newBuilder()
                                                         .executor(executor)
                                                         .expireAfterWrite(2, TimeUnit.SECONDS)
                                                         .build(loadingFunc);

        ExecutorService pool = Executors.newFixedThreadPool(50);
        for (int i = 0; i < 50; i++) {
            pool.execute(() -> {
                while (true) {
                    Stopwatch stopwatch = Stopwatch.createStarted();
                    testCache.get("");
                    stopwatch.stop();
                    System.out.println("normal: " + stopwatch.elapsed(TimeUnit.MILLISECONDS));
                    try {
                        Thread.sleep(100);
                    } catch (InterruptedException ignored) {
                    }

                }
            });
        }

        pool.awaitTermination(1, TimeUnit.HOURS);
    }
}

The text was updated successfully, but these errors were encountered:

ben-manes · 2017-10-10T17:24:13Z

My initial guess is that it is due to the compute time exceeding the expiration time. Currently the clock is read once to minimize expensive calls to System.nanoTime(). This is used to determine if the cached value has expired and is the writeTime set when computed. Thus, your threads might be having to recompute immediately, which causes the next one in line to recompute as well. That chain of C->B->A would lead to ~9 seconds and printouts that vary by ~3s depending on how long the chain was.

We could perform a re-read of the current time on the write to avoid adding the penalty of wait times and computation. For loads that are relatively quick compared to the expiration time it doesn't matter. For those like yours which are very tight it does. But I am also unsure if that's used and, if so, not already problematic? It would be nice to understand the use-case better.

perlambaek · 2017-10-10T18:52:17Z

So we kinda stumbled on this trying to debug an issue, where our caches seem to get stuck, unable to ever really return any value. We are not entirely sure this is the cause, but still felt this was odd, we will do more testing for this issue with the expireAfterWrite, and see if the issue persists.

As for the use case: in our case we have some work that is normally pretty fast (<1 second), but during certain peak times, our work may suddenly take up 90 seconds to complete instead of 1 second, due to unavailable database connections, I/O and so on. This leads us to exceed our expireAfterWrite (not by design, and not something we desire.), even though we in 99.999% of the time are well within it. I am then guessing we get a whole chain put together of threads wanting to recompute the value, putting our system to a grinding halt, as no cache value ever really seems to be produced in that case. (We are taking steps to reduce the 90 seconds, as this is also not exactly desired functionality, but the contract of expireAfterWrite still seems broken.)

We often support peak loads of >500 pageviews/sec so having 50 threads hitting the caches at 100ms interval is not at all unusual.

I am not at all into the inner workings of this, but it seems to me that when 1 thread is done recomputing, all others waiting for the result should be given the result of that computation, no matter if the expiration has been reached or not, as this will surely be valid since it was just produced, and thus cannot have exceeded the expireAfterWrite unless it was effectively 0? Subsequent access would then be able to read a correct writeTime if it was set at the computation end?

Please let me know if you need more details or something is unclear.

ben-manes · 2017-10-10T19:07:38Z

I think that is fair. I don't know if I will get a chance to walk through the code during the week, but will try to get a release for you over the weekend.

If you use an AsyncLoadingCache then the behavior will be as you expect. Is that an okay short-term fix? You merely change build(loadingFunc) to buildAsync(loadingFunc).synchronous().

The async cache stores a future that subsequent callers block on and when it completes the write time is set, so that it doesn't expire while in-flight like you are seeing now. If we set it on the synchronous case to, then a waiter or subsequent read will get the full duration as expected. It only impacts narrow cases like yours, but the cost is tiny so I just need to work out the tests to assert the change.

perlambaek · 2017-10-10T19:12:29Z

Yea I think we can work around it, so no rush on our account. Thank you very much for the swift response, and a very great library!

ben-manes · 2017-10-16T02:36:21Z

Sorry that I didn't get a chance to work on this over the weekend. Hopefully I'll be able to catch up during my commute this week. Thanks for being patient.

ben-manes · 2017-10-19T20:56:03Z

Really sorry that I'm not on top of this. Usually try to have a week turn around on bugs. Just been hectic lately and hard to decompress enough to switch contexts over.

ben-manes · 2017-10-30T04:38:48Z

I found a little time Sunday night and I have the fix passing your test. I'll try to work on the unit tests on the train tomorrow. Again, apologies for not being able to work on this more promptly.

ben-manes · 2017-10-30T20:59:57Z

I stole some time at work to wrap this up. I'll release this evening if the CI passes.

Previously, the current time was read once at the beginning of the computation. This was used to determine if the entry had expired and, when computed, the associated timestamp. For long computations and short expiration times, this could result in the newly computed entry being expired. A chain of computations would build up, each with timestamps older than the last due to waiting for its predecessor. Now the ticker is read again after the computation completes and the timestamps are set. This means there are two reads on a computation, but only 1 in the happy path of a cache hit. Note that obtaining the time can be an expensive operation, so minimizing this is useful for high loads. Updated dependencies and fixed issues discovered by Spotbugs and ErrorProne.

ben-manes · 2017-11-01T08:40:33Z

Released. Thank you for being patient and apologies for being slow to resolve this.

ben-manes closed this as completed in 107567e Oct 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected expireAfterWrite behavior when computation of entry duration > expireAfterWrite value #191

Unexpected expireAfterWrite behavior when computation of entry duration > expireAfterWrite value #191

perlambaek commented Oct 10, 2017

ben-manes commented Oct 10, 2017 •

edited

Loading

perlambaek commented Oct 10, 2017

ben-manes commented Oct 10, 2017

perlambaek commented Oct 10, 2017

ben-manes commented Oct 16, 2017

ben-manes commented Oct 19, 2017

ben-manes commented Oct 30, 2017

ben-manes commented Oct 30, 2017

ben-manes commented Nov 1, 2017

Unexpected expireAfterWrite behavior when computation of entry duration > expireAfterWrite value #191

Unexpected expireAfterWrite behavior when computation of entry duration > expireAfterWrite value #191

Comments

perlambaek commented Oct 10, 2017

ben-manes commented Oct 10, 2017 • edited Loading

perlambaek commented Oct 10, 2017

ben-manes commented Oct 10, 2017

perlambaek commented Oct 10, 2017

ben-manes commented Oct 16, 2017

ben-manes commented Oct 19, 2017

ben-manes commented Oct 30, 2017

ben-manes commented Oct 30, 2017

ben-manes commented Nov 1, 2017

ben-manes commented Oct 10, 2017 •

edited

Loading