-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WrongMethodTypeException after upgrade to 3.1.7 #1111
Comments
I don't have as much information as I would like quite yet, posted early in case others have observed this issue. When I uncover additional details or manage a minimal reproducer, I will update this issue. |
I think any details you have, like jdk version, would help. We primarily test on jdk11 with a few configurations evaluated on later jdks (e.g. benchmarks) that are hotspot/graal based. |
So far I've seen this on jdk-17 using Amazon Corretto 17.0.8.7.1 on linux hosts. I'm not confident that it's necessarily specific to jdk-17, because the vast majority of the services we run use jdk-17. The failure is not consistently reproducible in the environments where it has occurred, so I suspect there may be a cache interaction ordering component (or the cache key doesn't take into account some piece of class-loading/lookup state), as I wasn't able to force the failure by executing the same interactions with caffeine in isolation. |
do you use multiple classloaders where threads might cross that boundary? For example, iirc Tomcat hosts each application in its own classloader but ForkJoinPool threads cut across these, so |
That was my first thought as well, but these applications are roughly dropwizard-shaped microservices which use a single standard classloader. The rate of reproduction is very low, I've only seen ~10 occurrences across thousands of jvm launches. |
It's weird too because, if I understand correctly, |
To me this looks like a Hotspot bug... I haven't seen this anywhere. In addition the error message makes no sense. The signature matches exactly: expected ()NodeFactory but found ()NodeFactory |
The crazy error message comes from the fact that the method ypes are compared with identity not equals: https://github.com/openjdk/jdk17/blob/4afbcaf55383ec2f5da53282a1547bac3d099e9d/src/java.base/share/classes/java/lang/invoke/Invokers.java#L531 No idea why this is done like that. There are many places where invokeExact is used, so this bug is really strange. As workaround you can go back to invoke, but this will slowdown calls because exact type information is known before. |
It does seem as though |
I imagine our workaround will be a fallback path once we settle on the cause, e.g. use |
Maybe it is a multithreading problem and some bug in openjdk does not correctly intern/canonicalize the method type instance. We have to figure out if theres some caching involved. If there's really an issue, we may need to open openjdk bug report. |
Haha, add a try...catch around and call invoke on exception 😜 I have never seen this in our code. Invoke exact always works as expected. I really think this is a muktithreading bug in the cache of jdk. 😕 |
I wonder if it is expunging a temporary When it optimistically creates a I am kind of reaching, but it seems just plausible enough to happen, not be obvious to the author, and pretty hard to orchestrate that race. does this seem believable? |
The more that I look at the code, the more that I convince myself their concurrency is correct (assuming no gc/hotspot visibility race). Searching around all could find was FasterXML/jackson-modules-base#142, which was closed as not reproducible. Since you did not have this problem in the prior release, which used invoke, that seems like a safe fallback. The major difference with the previous release is that we now cache and reuse the instantiated Ideally you could dig in far enough to open a hotspot bug, or if the issue goes away we could close as not reproducible like the Jackson team did and hope the cosmic rays stay in our favor. Let me know what you'd like to do and a release timeline. |
Hi, caffeine/caffeine/src/main/java/com/github/benmanes/caffeine/cache/NodeFactory.java Line 151 in b1462dd
Actually this creates a new MethodType on each call so it does the work of invoke(). I would remove the method type adaption and just call invoke. In short! Lookup constructor and call it with invoke (like before). When we implemented this first time, the code was different (we cached the constructor handles and called them. But here we do the whole reflective lookup and call it directly a single time after adapting the type. That is too complex and brings nothing. You should also check the other one that uses invoke exact. If pattern is same, remove adaption, too. |
I checked the other ones: https://github.com/search?q=repo%3Aben-manes%2Fcaffeine%20invokeExact&type=code The one here is the only one that adapts on each call. The other 2 instances use adaption of method type a single time and cache result. |
I think the other one does it to, it just does it earlier than the search snippet shows caffeine/caffeine/src/main/java/com/github/benmanes/caffeine/cache/LocalCacheFactory.java Lines 117 to 129 in b1462dd
|
If you'd like to send a PR or provide a snippet then I'd happily merge that. I think I know what you mean but you're much more of an expert here 🙂 |
The alternative is to define the method type once as static final. Adapting and creating a new method type on each call looks wrong. 😜 |
Will do. |
Oh okay, so we should revert back to invoke like before? That seemed simple and effective. The main benefit from our prior discussion ended up being that we cached the factories so the method handle is used only once to create the factory instance. Then the factory is cached in a map, where using a factory uses direct calls. So ideally we only use the method handle once per unique policy combination (size, size + expire, etc). Then even old school reflection would have been equivalent performance-wise as the cost is paid once, rather than each time a new cache instance is built. |
For the case here, it is not made for performance, reflection would be fine. The main difference is: Because we use a lookup, we can handle package private classes without The LocalCacheFactory is a bit different: Here we should use invokeExact (as it looks to be called more often), but here we are caching the MethodHandles. When creating them, we should not do it on each lookup, instead define it once as static final. I have a PR almost ready, running tests. |
haha, we have an exhaustive suite so you might prefer sending a draft pr and letting the CI burn it’s cpu cycles instead of yours. |
…n (to work around a possible concurrency bug in JDK)
Here you are: #1114 |
I have not changed the benchmark, but I think you know wthat I did - maybe adapt the benchmark classes, too. Actually the LocalCacheFactory is the way how one should normally use MethodType instances: Do not create them over an over and keep them as static finals. If you look at the code inside the JDK how they use it for bootstrapping. I am quite sure that there is some strange problem inside the JDK that sometimes creates non-interned MethodTypes under high load. |
beautiful, I’ll merge this in after the CI completes. I’ll try to cut a release in a day or two when I find the time. |
The LambdaMetaFactory benchmark looks fine (...and its not productive code). I would remove that as this is no longer used (but that's up to you). |
…k around a possible concurrency bug in JDK) (#1114)
Thank you both! |
Of course it would be great if you could start up your services with your jdk using a preview build of the library and report if you see any problems. |
Released in 3.1.8 Thanks again for all the help fixing this @uschindler, @carterkozak |
I don't have a reproducer for this yet, but I've seen it pop up in a variety of contexts when building a caffeine cache. This has occurred on a few configurations using both
build
andbuildAsync
. #905 seems like the most likely culprit.The text was updated successfully, but these errors were encountered: