-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About ForkJoinPool.commonPool #323
Comments
If I recall correctly, a library that spawns threads has to be very careful to not cause classloader leaks. For example, Guava's By being more of a library than a framework, we don't have our own lifecycle (e.g. start, stop). This makes it feel natural to use as any other data structure, which I think we'd want to keep. Instead, custom frameworks can be built on top of it (e.g. Spring's or JCache's) to give lifecycle semantics, a registry, etc. Internally to Caffeine, I avoid calling We can add some JavaDoc somewhere, but I don't really see this as Caffeine's problem and the alternatives appear kind of ugly. |
Yeah, spawning thread pools in a library is not good practice for sure but the alternative is that refreshes are silently eaten. This is obviously not just Caffeine, even parallelStream() on something that does a join() somewhere could saturate the FJP.cP and for something that seems to be favorite new paradigms (Streams and CompletableFutures), it seems odd that it's so easy to break. Maybe a global supplyable threadpool for all Caffeine caches, or we just have to Guice everything and create a Factory for Caffeine caches. =p |
The only solution I might be comfortable with is to allow you to specify the default threadpool somehow, e.g. via a service loader. My concern is it could change semantics globally, e.g. if you use a library that also uses Caffeine (or not if shaded), and be a surprise. I'm not certain how that would behave and where it might fall apart. I think FJP.cP is very attractive and surprisingly brittle. It works really well in the happy path, but not with joins or other blocking I/O. I think it is okay to build up your own abstractions and patterns. It's best for us to be unopinionated to remain flexible enough for applications to be strongly opinionated. |
Yeah, I was mainly hoping someone would at least get to this page from a Google search and understand the issue better. :) |
Just as an example (to highlight the brittleness of this and how someone could accidentally misuse this, a loading cache loading some rows from FoundationDB (https://github.com/apple/foundationdb) could do:
And that would throw a RejectedExecutionException at the for loop because of the .join() in https://github.com/apple/foundationdb/blob/master/bindings/java/src/main/com/apple/foundationdb/RangeQuery.java#L250 Closing for now. |
I agree, thanks for highlighting it. Sorry that I don't think there's much for us. Hopefully, others see this and refactor their code to be more polite in threadpool usage. FJP is awesome for churning through tasks, but pretty awful when anything blocks. I'm excited at all the progress on fibers and how transparent they'll be when they land in the JVM. |
P.S. @panghy, since you use fdb you might be interested in apple/foundationdb#1518. Hopefully someone will grab a trace and we can make a good argument towards adopting Caffeine (rather than their random replacement cache). |
Always looking forward to improvements to FDB, we are trying our best over here to get things moved over to open-source so we can start contributing our share of features/improvements. No doubt TinyLFU will be better, if it's a knob that one can change, we can at least help with benchmarking it right now. |
Seems like we hit this in production (coincidentally when Caffeine was upgraded from 2.7.0 to 2.8.8, but I couldn't find anything obvious in release notes/commits that would start triggering this). We have many different cache instances that all are created without any explicit executor at the moment, and the
I'm not familiar with the Caffeine code base, but I just skimmed through our code base for usages of package com.github.benmanes.caffeine.cache;
final class Async {
// ...
/** Returns the value when completed successfully or null if failed. */
static @Nullable <V> V getWhenSuccessful(@Nullable CompletableFuture<V> future) {
try {
return (future == null) ? null : future.join();
} catch (CancellationException | CompletionException e) {
return null;
}
}
// ...
} I didn't check all usages of it and in which contexts it's used, but there's no guard around this, so I'm just curious if this is ok or if it should be guarded with an |
The cache only calls a blocking method on the future (join, get) if the AsyncCache<K, V> asyncCache = Caffeine.newBuilder().buildAsync();
var future = new CompletableFuture<V>();
asyncCache.put(key, future);
// returns null if not yet complete via getIfReady
V absent = asyncCache.synchronous().getIfPresent(key);
// does not block as Cache#put does not return a value
asyncCache.synchronous().put(key, newValue);
// blocks if an existing future to return the old value
V oldValue = asyncCache.synchronous().asMap().put(key, newValue); Most likely the FJP is being consumed by your I/O tasks, which makes it sneaky as it is a bounded, shared pool. Someday when Loom lands then accidental thread starvation won't be so easy. |
If you are using these views, the implementation was improved for correctness, e.g. |
Thanks for the pointers 👍 In this case we are only using plain Many of the |
After seeing issues of this sort in production numerous times, I wonder if this should be clearer in the docs.
Essentially, if you use the default executor for refresh and happens to call any code that does works with CompletableFutures (assuming for a second that you can't control that code) and it calls .join() .get() on a CompletableFuture anywhere, there's a chance that the refresh would fail since FJP will throw once parallelism is reached (unlike a cached ThreadPool where the number of threads can be bounded but the work queue might not be). Small caches are probably okay but if you have a large enough set of elements and refreshes happen frequently, parallelism isn't that difficult to hit.
I wonder if it's more user friendly for Caffeine itself to use it's own threadpool (spawned on demand) as the default executor rather than having to assume that someone will either override the executor or refactor the code so that any join() or get() is propagated all the way to the top so that we can do do an async refresh with an AsyncCacheLoader.
A classic stack trace would look something like this:
The text was updated successfully, but these errors were encountered: