-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread lock in LocalCache.lockedGetOrLoad #3602
Comments
We notably have the issue with threads handling http requests and the ones emitting emails to wait for a ReentrantLock of some sort. I could recheck from all the thread dumps we have, but the last methods called from Guava is always
We obviously haven't managed to reproduce the issue. It occurs roughly once per week, but there is no firm frequencies, that happened 3 times today for now good reason we can tell. I do not know anything about java, but could it be that a lock is not released? For example if a thread is abruptly aborted? |
For reference: GerritCodeReview/gerrit@b4cb044 |
I believe the right answer, regardless of the caching library, is to not rely on the cache's internal locking. Gerrit's caches have circular dependencies which can become problematic, e.g. It was unclear whether this problem is due to actual application deadlock due to circular dependencies, or was due to too coarse grained locking / bugs in the cache implementation. If they have an application induced deadlock (A req B, C req A, B req C) this has to be fixed within their logic. The cache cannot do anything but try to detect it if possible. If the resource are loaded in a safe order, then there are two options that they might consider. Future-based cacheThis is similar to how Guava's cache works, which has hashmap locking and entry locking. In Caffeine this can be done using Use
|
FWIW, we don't have anyone who's an expert on We've been generally pointing people toward caffeine since it's better maintained and Ben does know this stuff. |
@ben-manes hi! Is there anything we can do to https://github.com/GerritCodeReview/gerrit/blob/stable-2.15/gerrit-server/src/main/java/com/google/gerrit/server/account/AccountCacheImpl.java#L85 in your opinion to try and make sure it dosen't deadlock? Or if it tries to, a timeout happens? |
@ben-manes thank you for the detailed explanation, unfortunately it is really above my league. From a today thread dump, I noticed threads were blocked on that thread:
And there is no occurrence of the lock it is waiting for ( I was hoping to eventually found a good old deadlock between threads as you have mentioned: A req B, C req A, B req C. But the JVM never notices any deadlock and I have no idea how to figure out what could have set that object. Given you recommend and Given @cgdecker comment that I would like to thank you all for the very quick triage, useful leads and information. That is very much appreciated. |
I wonder if changing it to:
will workaround the issue? I'm not sure if it still needs the try {} catch part too? |
@paladox That change would mean the cache never loads the value. Currently it calls You could either take control of the locking yourself, e.g. using @hashar, @paladox
This would mean multiple threads could perform the expensive load at once, but avoids deadlocking. That's likely the simplest solution and good enough? |
Thank you sooooo much @ben-manes!! Question about the second part, i doin't see 'loadById' in the code to use? |
Somewhere the
Instead you would need to call that loading function, e.g. |
@ben-manes when doing:
i get:
Using loadById also failed. Also using ByIdLoader directly failed too. |
You probably want someone who is familiar with the Gerrit code to make the changes. In short, you want to add |
@ben-manes i managed to figure out how to get it to work: |
perfect. Then I think this can be closed from the Guava side as won't fix, per @cgdecker's comment. Hope that solves your problems. |
Yup! Thanks! |
@ben-manes hi, i'm wondering if you know how i can convert:
To use your workaround? (this is for gerrit master which this file differs to 2.15) I tried:
But the tests fail with:
I'm guessing i'm missing something obvious, or maybe i did it wrong slightly? |
assertThat(result.averageGet).isNotNull(); This is because you are no longer loading through the cache, so |
Thank you! https://gerrit-review.googlesource.com/c/gerrit/+/239494 <-- updated that :) |
May I ask, is there any conclusion about this issue? |
@flying1020 at Wikimedia we encountered the issue in Gerrit (a java based code review system written by Google). The issue disappeared for us after we have we have migrated to a new hardware AND upgraded the underlying Debian operating system (from Jessie to Buster). We probably raised the JVM heap at the same time. Maybe it is has been caused by a Linux kernel issue, the openjdk or a system library. But surely we haven't seen it again! |
@hashar Thank you for you reply. I encountered this issue today, the stacks of thread as same as you comment above, thousands of thread is blocked at |
Note, though, that There is one exception, though, recursive cache loader: [1] https://gerrit-review.googlesource.com/c/gerrit/+/244612 |
yes, we have StackOverflowError and lock not release, just same as hashar |
Hi, Wikimedia use gerrit 2.15.14. Since earlier this year we have experenced alot of thread locks on LocalCache.lockedGetOrLoad which to resolve we have to restart gerrit.
We use version 24.1.1-jre.
https://github.com/GerritCodeReview/gerrit/blob/stable-2.15/lib/guava.bzl#L1
We are not the only one to experence this issue (they tried replacing guava with caffeine but had to revert due to other problems) (https://bugs.chromium.org/p/gerrit/issues/detail?id=7645#c17).
Our task downstream https://phabricator.wikimedia.org/T224448.
Thread dump https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTkvMDkvMTEvLS1nZXJyaXQtanN0YWNrLnR4dC0tMTEtNTEtNDI=
I'm not sure if a newer version fixes this.
Also i'm not sure what other information you would like us to collect?
The text was updated successfully, but these errors were encountered: