Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BoundedLocalCache throws NPE on Shutdown if CacheLoader returns a null. #35

Closed
tellisnz opened this issue Dec 1, 2015 · 8 comments
Closed

Comments

@tellisnz
Copy link

tellisnz commented Dec 1, 2015

This:

    @Test
    public void throwsNullPointerException() {
        LoadingCache<String, Object> cache = Caffeine.newBuilder()
                .maximumSize(10000)
                .build(key -> null);
        cache.get("NullKey");
    }

Throws:

Exception in thread "ForkJoinPool.commonPool-worker-1" java.lang.NullPointerException
    at com.github.benmanes.caffeine.cache.BoundedLocalCache$RemovalTask.run(BoundedLocalCache.java:1131)
    at com.github.benmanes.caffeine.cache.BoundedLocalCache.drainWriteBuffer(BoundedLocalCache.java:1025)
    at com.github.benmanes.caffeine.cache.BoundedLocalCache.maintenance(BoundedLocalCache.java:904)
    at com.github.benmanes.caffeine.cache.BoundedLocalCache.cleanUp(BoundedLocalCache.java:888)
    at com.github.benmanes.caffeine.cache.BoundedLocalCache$$Lambda$4/1020154737.run(Unknown Source)
    at java.util.concurrent.ForkJoinTask$RunnableExecuDisconnected from the target VM, address: '127.0.0.1:49629', transport: 'socket'
teAction.exec(ForkJoinTask.java:1402)

on shutdown. Is this desired behaviour or have I done something wrong above?

compile 'com.github.ben-manes.caffeine:caffeine:2.0.1'

Should the check on BondedLocalCache line 1812:

    if (node == null) {
      afterWrite(null, new RemovalTask(removed[0]), now);
      return null;
    }

maybe include removed[0] not being null? I.e.

    if (node == null && removed[0] != null) {
      afterWrite(null, new RemovalTask(removed[0]), now);
      return null;
    }
@ben-manes
Copy link
Owner

That's an excellent point and thanks for the test case + analysis. I think your fix is correct, except most likely the added check should be only around afterWrite so that the method eagerly returns.

I'll review to make sure there are tests for all cases when the CacheLoader (or mapping function) returns null (e.g. refresh) and probably have a patch release out tomorrow or the day after.

@tellisnz
Copy link
Author

tellisnz commented Dec 1, 2015

Perfect, thanks Ben.

@ben-manes
Copy link
Owner

There's a test case for the scenario in a generalized fashion (so all configurations can be verified). It looks like I missed this because I disabled logging to avoid spamming the results as some tests assert the exceptional conditions. Since the internal data structures of the cache weren't corrupted (e.g. removal task had nothing to do) the validation aspect passes. So the issue is benign except for scary log messages.

Since this is the executor failing but the logger swallowing the cases, I'll probably have to do something quirky like supply an executor that remembers if it failed. I think figuring out how to communicate this as a test failure (so other cases are caught) will be the tricky part of this bug fix. For example asMap().compute(key, (k, v) -> null) has the same problem with AddTask. Ideally the test executor should highlight that scenario as well.

@seidler2547
Copy link

I was about to finish changing our current caching system over to Caffeine as I stumbled upon this also. It is also harmful if "cleanUp" is called explicitely on a Cache since then this (unchecked and undeclared) Exception is thrown from the cleanUp method, which will lead to problems.

@ben-manes
Copy link
Owner

@seidler2547 I agree cleanUp shouldn't be expected to throw an exception in general. Adding a guard makes it tricky to figure out a nice way to test since I can instrument the executor which delegates to cleanUp. Since a same-thread executor could be used, changing to an unsafe method is still leaky. If we assume my code won't throw because its fixed, then the executor and such things behave fine without a catch guard clause.

A problem is that CacheWriter might throw on delete, which could be on an eviction. That cancels the mapping change and the outer operation, which might be a cleanUp. Caffeine.writer(...) is documented with a warning,

Warning: any exception thrown by {@code writer} will be propagated to the {@code Cache} user.

So I think its okay to say that a CacheWriter.delete() can propagate to the cleanUp() caller, just like it might on any other write. The user has to be careful when writing a CacheWriter anyways since they are using an advanced integration option.

I think if we can agree that's okay, then testing isn't hindered.

ben-manes added a commit that referenced this issue Dec 2, 2015
When an absent computation returns null, a add/removal task was being
improperly scheduled. As there was no Node, this null field was then
causing an NPE during the maintenance cycle. Usually that was on a
background executor, but could be visible if `cleanUp` is called.

This wasn't caught by tests since the executor's was swalled, logged,
and logging was disabled to avoid spamming the report. Now the executor
is instrumented so that the validation can assert no failures occurred.
This detected the two bugs found with existing tests, so this provides
more coverage than a one-off fix.
@ben-manes
Copy link
Owner

Okay, cleanup() now avoids propagating the exceptions as expected. It took me a little bit to get all the details sorted out I guess.

I'll release 2.0.2 now with this fix. Thanks again.

@ben-manes
Copy link
Owner

This is now released. It took a little while to work out some test issues. Should sync from staging to central within 2 hours.

@tellisnz
Copy link
Author

tellisnz commented Dec 2, 2015

Looks good, thanks for the speedy response Ben!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants