Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronous Cache might block on put() #224

Closed
panghy opened this issue Feb 28, 2018 · 18 comments
Closed

Synchronous Cache might block on put() #224

panghy opened this issue Feb 28, 2018 · 18 comments

Comments

@panghy
Copy link

panghy commented Feb 28, 2018

I am trying to get a test to reproduce this but we have observed that a synchronous cache can block on put() calls if there's a background refresh going on (it's blocked within the ConcurrentHashMap).

I would have expected the code to either abandon the in-flight refresh (continue execution but ignore the results) or accept the put() and when the refresh returns, overwrite the value that was supplied by the put(). Instead, the put() call blocks.

This does not seem to happen to the Async flavor.

@ben-manes
Copy link
Owner

Is this with a current version? I think we fixed it in v2.3.0 but it has been a while (see #56).

@panghy
Copy link
Author

panghy commented Feb 28, 2018

Yeah, mostly. We caught the stacktrace on 2.6.1 (essentially the put() call is blocked on CHM).

@panghy
Copy link
Author

panghy commented Feb 28, 2018

Actually, it wasn't a refresh that blocks it but a get():

    LoadingCache<String, String> cache = Caffeine.newBuilder().
        build(s -> {
          if (s.equals("foo")) {
            latch.countDown();
            Thread.sleep(30000);
            return "bar";
          }
          return "world";
        });
    // trigger to load which will block for 30s.
    Thread t = new Thread(() -> cache.get("foo"));
    t.start();
    latch.await();
    // put a value, does not expect to block.
    cache.put("foo", "world");
    // expect to see a value of "world"
    assertEquals("world", cache.get("foo"));

cache.put() is where it blocks, the assertEquals works though.

@panghy
Copy link
Author

panghy commented Feb 28, 2018

And the Async version doesn't block.

    CountDownLatch latch = new CountDownLatch(1);
    AsyncLoadingCache<String, String> cache = Caffeine.newBuilder().
        buildAsync(s -> {
          if (s.equals("foo")) {
            latch.countDown();
            Thread.sleep(30000);
            return "bar";
          }
          return "world";
        });
    // trigger to load which will block for 30s.
    Thread t = new Thread(() -> cache.get("foo"));
    t.start();
    latch.await();
    // put a value, does not expect to block.
    cache.put("foo", CompletableFuture.completedFuture("world"));
    // expect to see a value of "world"
    assertEquals("world", cache.get("foo").get());

@ben-manes
Copy link
Owner

Oh, this is as expected, isn't it?

This is a computeIfAbsent behind the scenes to memoize the value when loading. In Guava a put will stampede the value and the computation will be dropped (notified as replaced). This would be similar to google/guava#1881 which caused some people problems. Since we're building off of ConcurrentHashMap, we get its behavior which is stricter than Guava's.

The async version is much simpler because the mapping is established immediately and the value computed asynchronously. In that case you can replace an in-flight future. If you instead use the synchronous() view, we should be blocking for the current value to be complete before establishing the mapping.

@panghy
Copy link
Author

panghy commented Feb 28, 2018

So the synchronous version actually has blocking puts against the same key then (compared to the async flavor). Always thought the asynchronous version exhibits the same behavior but enables asynchronous programming paradigms.

We did ultimately just switch to the async cache but perhaps this is just a documentation issue then (or maybe it was, I dunno).

@panghy
Copy link
Author

panghy commented Feb 28, 2018

Of interest, get() with a supplier has the same behavior for both synchronous and asynchronous versions (which I assume it would since in both cases the computation is in flight and so it ignores the supplier).

    AsyncLoadingCache<String, String> cache = Caffeine.newBuilder().
        buildAsync(s -> {
          if (s.equals("foo")) {
            latch.countDown();
            Thread.sleep(30000);
            return "bar";
          }
          return "world";
        });
    // trigger to load which will block for 30s.
    Thread t = new Thread(() -> cache.get("foo"));
    t.start();
    latch.await();
    // put a value, does not expect to block.
    cache.get("foo", key -> "world");
    // expect to see a value of "bar"
    assertEquals("bar", cache.get("foo").get());

@ben-manes
Copy link
Owner

We can definitely update the JavaDoc if suggestions (or PR). I do think it makes sense for AsyncLoadingCache#put to be non-blocking as the existing future may not be completed yet. There were very enough slight differences that I didn't feel comfortable making AsyncLoadingCache implement LoadingCache<K, ComputableFuture<V>>, though I would have liked to.

In the async case, you are establishing a mapping immediately and the future populates asynchronously. It can fail, or be null, and any consumers will receive that. You are explicitly asking for it to work outside of a computeIfAbsent block in ConcurrentHashMap, other than for the basics of setting the future.

One could argue that its okay to do similar for a synchronous case, like Guava did, but you'll run into linearization complaints like the issue above. The choice is gray for a cache, but wasn't for a regular map which is why ConcurrentHashMap is more pessimistic. Since I'd prefer not forking or writing my own hash table, we respect its behavior. In most cases I think its less surprising and like you've shown, the async is an escape hatch that is probably acceptable when a performance issue.

@panghy
Copy link
Author

panghy commented Feb 28, 2018

Alright, this makes sense. I guess this is mainly going to be a synchronous loading cache issue. Background refreshes are fine in that case too as far as we can tell (meaning if a background refresh is in flight, it does get preempted by the put). Perhaps it's just the load() part that can potentially block puts that needs to have somewhat of a gotcha.

@ben-manes
Copy link
Owner

Yes. But also consider the return value of asMap().put when a load is in progress. Should it block, replace, and return the loaded value? Or should it insert and the load be canceled as a replacement? Both are valid implementation decisions, I think.

@panghy
Copy link
Author

panghy commented Feb 28, 2018

Yeah, this led me to think that perhaps you want this behavior to be configurable (even perhaps to cancel the future and interrupt even). Anyways, closing since the issue should really either be a) a documentation request/PR or b) configurable behavior for synchronous loading cache when dealing with puts (although even for asynchronous case you could say that whichever return first, the refresh vs the put could determine the replacement policy).

@ben-manes
Copy link
Owner

(a) This would be great if you have a PR or suggestions for the FAQ page.
(b) That would make a lot of sense if we had a custom hash table. I might be a tad hesitant as low level tweak, but it would be a reasonable ask. Since we didn't fork CHM (like Guava did), unfortunately I'd have to say no as modifying its logic is required. Guava was stuck on JDK5's CHM (rewritten in JDK8) and I am sure it will be heavily modified in ~JDK11 when value types are implemented. I think its just too much complexity to absorb, since we'd want to do other tricks for memory efficiency, unfortunately.

P.S. Closing per "Anyways, closing since the issue..."

@ben-manes
Copy link
Owner

A refresh(key) does not populate the cache until the asynchronous action completes and only if the entry of the cache was unchanged. Guava has a nice optimization here to ignore redundant refreshes that we haven't yet implemented, as explicit calls are rare so I haven't gotten to it yet.

Have you considered using an AsyncLoadingCache to compute asynchronously? You can then call get(key) and it will not block other than initializing the map with the future.

@ben-manes
Copy link
Owner

Sorry, I missed your last paragraph. You can use the synchronous view to minimize code impact by interacting like a LoadingCache

@michaelrebello
Copy link

I ended up into thus thread when a put &refresh combination got blocked. Sorry for posting on a closed issue thread

I was trying to replace guava with caffeine and implement totally async cache using getIfPresent and refresh only (no get() call as it loads the data inline. In my case if cache entry is not present, i call a refresh() and i continue without it but strictly dont want to block).

Some observations i made:

With the usecase above i ended up calling too many refresh() calls as cache entry initially was not loaded. This might be the case only during initial load as subsequent loads are taken care by refreshAfterWrite()

To stop this behavior i tried to load an initial dummy value into the cache by calling put(k, dummyval) and then call a refresh(), i ended up into a put() block. Idea here was that for calls after first one, i dont end up getting a null as return for getIfPresent() and do not keep calling refresh() which i mentioned in #1 above. I assumed, refresh would ultimately replace the dummyVal.

Pseudocode #1:
int getValue() {
val = loadingCache.getIfPresent(k)
if (val == null) {
loadingCache.refresh(k)
}
return val;
}

Pseudocode #2:
int getValue() {
val = loadingCache.getIfPresent(k)
if (val == null) {
loadingCache.put(k, dummyVal)
loadingCache.refresh(k)
}
return val;
}

Is AsyncLoadingCache solution for the usecase (when will async loadcache get loaded? as i do not want to call a future.get() and blocked), i have not explored it yet as it needs some changes to make in my code which was using guava cache earlier.

@michaelrebello
Copy link

Sure. I just deleted my comment thinking there will be late reply and i thought il post it again once i do a little more work on this. But u noticed there was reply from you and i have put my comment back after 2 of your comments so that other readers do not lose the context. Thank you for blazingly fast reply too.

@michaelrebello
Copy link

michaelrebello commented Sep 2, 2018

Just a doubt on AsyncLoadingCache.get(), it returns me a future. Is it necessary for me to do a get on it(future) in order to load the cache? Because in my usecase, I am ok to not have a cache entry at that moment but make sure it loads some point of time later and i ultimately find it. I would not want to do a blocking call anywhere in my main thread where i lookup into cache.

Also, when i deleted my comment i thought il do this below change and see if i can avoid an explicit refresh() call on the cache.

For the first time(i have an external way to find first ever call to the cache lookup not just rely on null return for getIfPresent()), just set the dummy value to the cache and do not refresh it explicutly as refreshAfterWrite will anyway trigger async refresh)

@ben-manes
Copy link
Owner

Your main thread doesn’t need to block, as the future will execute by default on a ForkJoinPool. You can query its state or chain handlers, so that should allow your caller to skip if still computing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants