-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible JDK16 issue in 3.0.0: ThreadLocal not initialised? #515
Comments
Thanks. That is certainly surprising. As you saw from the code, it tries first to use Glancing at the code and I do see my mistake in the thread local version, so I apologize for that. I'll have a fix out by the end of the week. |
hm, in that case we also need to figure out why it's not doing the VarHandles thing if, as you say, and seemed to me, it was fine in JDK15. For my part I'll put in a full jdk rather than my linked server-jre to check at least it's still happy with jdk.unsupported on board. Edit: Yes, it is. |
JDK16 enables more restrictive access. Unfortunately that also disallows our ability to use VarHandles in this class, because we use Doug Lea's approach in I was hoping that it could be inspected by VarHandles rather than Unsafe, and that would punch through the module layer. Sadly that isn't true, so it fell back to the thread local version where I messed up the initialization order. I am not super thrilled at the thread local version as that won't be quite as fast, since Java's implementation is not known for its speed. I am not really sure if there is any better way since the JDK doesn't provide a facility like
|
A pre-release of 3.0.1 will be published as a snapshot build by the CI, or available by jitpack. That should fix the initialization problem. I'd like to find a more optimal approach than thread local so I'll leave this open for a little bit to nag me, before I cut a release this week. Any ideas would be greatly appreciated! For context, we use multiple ring buffers to record the entries that were read and then replay these events against the eviction policy later on, under a lock. This avoids every thread needing to lock in order to manipulate an LRU, where that manipulation is cheap but the frequency of locking is not. As a library, we don't know how many cache instances and how much concurrency an instance is subjected to. To be friendly, instead of pre-allocating a ring buffer array we let it grow on-demand. Because multiple threads might hash to the same array index, we use the probe to rehash so that under load every active thread is on an independent instance. This is how |
The performance impact is tolerable and the below benchmark is single threaded throughput. The best case is to simply hash the thread's id. That is fixed, so multiple threads that index to the same array slot would suffer contention. Any dynamic mapping needs to reassign threads or purge stale thread mappings. For example if we had a concurrent This is why Doug uses the probe to rehash with, thereby spreading the load evenly. In that case Unsafe and VarHandles perform equally. The thread local version drops significantly, but is still acceptable. On my laptop the
The impact to the cache on the multi-threaded After chatting with Doug, he is kindly considering what a generic user-facing |
Released 3.0.1. Thanks again for reporting this bug. |
Yes, my fallback when Caffeine breaks is a pretty naively written (by me, years ago) simple LRU cache wrapped around a ConcurrentHashMap, so I'm sure this still beats it. :-) (They both work behind a wrapper interface in our app so it's just a config change to switch cache impl.) |
@StrangeNoises I found writing a good concurrent lru was really tricky. Did you try anything fancy? I first tried Clock (but disliked the unpredictable scan lengths), then lock-free lists (but buggy and high contention at the tail), and finally learned of the buffering approach. Another nice idea that I didn't try was random sampling, but it's very inflexible. Memcached simply ignores lru updates for 60s after an entry was touched and uses a timer thread to avoid system call overhead. Eventually I realized the trick is to view it as request sampling, as the hot/cold nature of a cache means popular items will be moved into the warm region and any cold item can be evicted equally well. Thus, scaling an LRU means dropping 99% of the events as 1% is enough to obtain the same hit rate. As long as the hash table is consistent, the policy itself can be very lossy. I spent a lot of time bewildered and banging my head iterating on that, with the much simpler ConcurrentLinkedHashMap as the result. Then years later I got tired of never coding due to meetings and started again over my train rides by learning what researchers did in newer eviction policies... |
Like I said, mine was "naive" and probably fails at your first hurdle. Its main virtues are that it does actually work and is better than the thing it replaced (something that came down to us from the dot-com era without even any synchronization, though we could at least stick it in a synchronized wrapper). When I saw yours, and the credentials and recommendations it came with, I basically thought, well, if that's not better than the thing I knocked up in about a week from first principles and without a compsci background, we're really in trouble! I only even keep ours around as a fallback - and keep Caffeine behind a shim to the same API layer - precisely because of Caffeine's use of unsafe/private APIs in the JDK. Because we don't otherwise do that, which means we might be missing some performance opportunities but OTOH we can be pretty fearless about jumping onto the latest JDK on release day to play with its new public API toys. I hate being trapped on old Java versions, and won't let it happen again. That's why I'm actually pretty relaxed about the relative loss of performance in your fallback method here, as we made that choice a policy long ago. As long as there is a fallback and it actually works ;-) it's still likely to be a win, being the result of more careful and researched consideration than I was able to give mine. For a while, and until recently, we were using embedded tomcat serving HTTPS directly, and their HTTPS connector required jdk.unsupported as well. Our not needing to do that any more nearly coincided with Caffeine 3.0.0 coming out, so we were finally able to jlink our server-jre without jdk.unsupported again. But then jdk16 hit. Let's see... I'm sure it'll horrify you but... It really does just wrap a |
I certainly remember pains upgrading JDKs because random application code depended upon bugs (for whatever bad reason), used an abandoned libraries with an ancient ASM, etc. Someday I hope to scrub Unsafe for real, and thanks for being understanding of my foolishness with this fallback logic. I'll refactor things a tiny bit to get a better test harness, as obviously I didn't have proper coverage to catch that. Honestly that LRU sounds a lot better than most that I've seen. If it's not broken, then a lot are much slower than what they are replacing. Even really smart engineers screw that up royally (ElasticSearch, Spring). Your choice of a watermark, sort, and evict asynchronously was a good one, as it is fairly robust (the old caches in HBase & Solr used that too). I dislike that the cost increases as the cache size grows, because random system pauses is one of those annoying production performance hiccups (e.g. some unoptimized SQL query saturating the db i/o). The only flaw in Caffeine's approach is the eviction policy's FrequencySketch has an amortized reset operation, but that is hardware friendly and never visible on my profiles. Soon maybe we can play with SIMD (new Vector API) and make that magical. 😄 Anyway, I won't bug you anymore as we head into the weekend. Thanks for talking tech a bit, as I suffer from PHB syndrome and miss the fun of hacking something unsightly but full of neat tricks. |
I don't mind at all. I obviously did worry about the potential for pauses during eviction, as I found this missive sent into the future by past-me (git blame says 2016-me):
(Probably faster now as the computers are and Java is....) |
Doug suggested using incremental hashing on the thread id instead of the randomized hash probe. A benchmark showed this to be roughly the same as the unsafe version, which is faster than the thread local fallback. I removed all of those probe variants and this will be the only strategy in the next release. The overall logic stays the same of using a lazily grown table, where buffers are added based on thread contention. Java's counters have to retry indefinitely, so Doug's approach converged to perfect hashing and is the most optimal. Caffeine is allowed to drop events so it retries up to a limit (3x), so a less perfect distribution has no user-visible impact. Most likely there isn't a difference had LongAdder used incremental hashing instead, but being JDK internal it could go with a theoretically better version. I suppose that I over complicated it by borrowing too much of Doug's code. 🙂 Now the strategy looks like this (simplified), long z = mix64(Thread.currentThread().getId());
int increment = (int) (z >>> 32) | 1;
int h = (int) z;
for (int attempt = 0; attempt < ATTEMPTS; attempt++) {
var buffer = buffers[h & mask];
var result = buffer.offer(e);
if (result == Buffer.FAILED) {
h += increment;
} else {
return result;
}
}
return Buffer.FAILED; where |
So I just upgraded our project to JDK 16 and tried to actually run it. It failed with this:
(The rest of the stack trace is in our code)
It looks like it's trying to read values from that ThreadLocal in StripedBuffer.ThreadLocalProbe before it's initialised.
I've been running 3.0.0 in JDK 15 for a while just fine, with jdk.unsupported absent, so this looks like it may be a JDK 16 thing. Don't know if relevant but they did do some deprecating stuff in ThreadGroup. OTOH only deprecating (for removal later), not actually removed.
The text was updated successfully, but these errors were encountered: