-
Notifications
You must be signed in to change notification settings - Fork 38.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate possible AOT performance issues #31307
Comments
Thanks @bclozel and @snicoll! I really appreciate your help! I can try in a new branch to reduce the package nesting in the core module, eliminate some unuseful classes, and check if I have similar results so that the sample is a bit easier to investigate. |
Hey everyone.
If we use AOT then instance supplier is used for creating bean instances. In case of indexer instantiateUsingFactoryMethod is used. And by some reason instantiateUsingFactoryMethod works twice faster at least for me. And a little bit more info: |
Hello, for transparency reasons I should add that @snicoll approached me and asked if I could (look into/help with) that and there has been some email communication around it, where I shared some findings already. I will try to summarize these findings now here...I should also note that I'm only a community member, not a part of the Spring team. So take all of the following with a grain of salt. Moreover, I don't want to argue against the timing difference between obtainFromSupplier and instantiateUsingFactoryMethod that was found earlier. I haven't found it to be the real root cause, but there might be indeed an issue. Probably not the most relevant one as you will notice by the end of this comment. First investigation step: Async-ProfilerI've profiled the AOT example vs. the indexed example with the async-profiler with several modes and settings. E.g.:
All of the profilings unfortunately didn't show up any major difference inside the html flamegraphs. (Of course slight differences, but nothing that would have explained a second difference - I thought at least). What it showed was that with AOT the stack is a bit larger with obtainFromSupplier in play. Via the Second investigation step: Type pollutionWhat I noticed during some code checks was that
BUT as you can see the count (6) was fairly low and the rank of it was also not relevant (656). While it pops up several times for each lambda or bean that's going through Third investigation step: SyscallsThe async-profiler - while being a great tool - is still a sampling profiler, after all. While one can tune the sampling rate of course, it's still sampling, so my thought was that we might miss out on something. Wall profiles usually also show a lot of irrelevant (idle) threads either. And while you can exclude these as well to a certain degree, I'm a friend of using
This showed the following difference of TOP 5 syscalls in AOT vs. indexed:
Not just the general count was up with AOT, but more substantially SIDE-NOTE: While diving through the JDK code to seek confirmation that read & lseek are used during classloading - which they are when classes are read from a JAR - I wondered why Fourth investigation step: ClassloadingClassloading is a bit simpler to "profile" and should have been one of the earlier steps I suppose. At least before tracking the syscalls, but hey - that's how my brain worked during the investigation. First I wanted to get a basic understanding of how many classes were loaded in the two different approaches, so I executed the following:
The JVM has a few interesting perf-counters that can be printed with the above command. Grepping for
Great. So we indeed not only confirmed an increased amount of classes being loaded (~2-3k), but also (unsurprisingly) a whole second difference being spent there. Being onto something, I added
So I did this again for both AOT vs. indexed. I sanitized the lines a bit (e.g. stripped away the prefix While looking at the diff, I noticed lots of bytebuddy classes, but also some Since the example isn't that large only Hibernate could have been an explanation for the additional classes being loaded. Fifth investigation step: Finding the "root" causeWith Hibernate in my mind, I looked again at the profilings that I did in step 1 and noticed indeed something that sparked my interest: AnnotationMetadataSourceProcessorImpl::processEntityHierarchies. Let's speed this investigation up. I basically looked through the code of the mentioned method and came up with the following explanation. In the example project provided here there are two modules: core & webapp. Inside core there are entities, in webapp there are none. The context-indexer only is inside core, so only in core a spring.components file is generated that contains the following (omitted the other entities & repositories for brevity reasons).
However, the PersistenceManagedTypes bean inside JpaBaseConfiguration only scans for the webapp package – not for core! Which ultimately leads to the persistence info unit not containing any managed classes. Why this doesn’t break is beyond my understanding, because I haven’t used the indexer ever, but I assume it isn’t a problem because repositories are also part of the index and the managed classes would be likely only relevant if the repositories are scanned normally as well. Maybe? Anyhow. On the AOT example however, the managed types inside the AOT generated JpaBaseConfiguration__BeanDefinitions contains the entities
Therefore the metadata building processes the managed classes inside e.g. AnnotationMetadataSourceProcessorImpl::processEntityHierarchies. And while I’m sure there are more places inside the whole flow that deal with managed types, all of them combined and having properly registered managed entities ultimately cause lots of additional classes being loaded in the startup phase. In order to test my theory I reworked the example project to have only one module to eliminate the difference in the indexer and correctly use managed types and AOT was even slightly faster in that example. At least no drastic performance degression in AOT really. I sort of regressed the indexed example with that setup because managed types were properly registered and handled there now as well. I guess the question which you need to solve is if the managed classes should be registered in the indexed example. And why the indexer doesn’t fail without the entities being registered as managed types. (I'm sure @snicoll can say a word here) Summary & LearningsWith all of the above there is no real performance regression in AOT, in my opinion. It's rather a misbehaviour of the indexed example of not registering managed types inside the persistent info unit. Nonetheless it revealed a potential scaling problem in the future for large projects caused by (https://bugs.openjdk.org/browse/JDK-8180450) and revealed an optimization opportunity when looking up classes from JARs (https://bugs.openjdk.org/browse/JDK-8301621) that is worth being evaluated further, because it likely helps in all normal Spring workloads where the app is started as a fat-jar. This was quite a bit of fun to be honest. During the investigation it once again showed that profiling is important. But quickly jumping to conclusions based on just one aspect is often too easy... I noticed that I'm for example "biased" these days to look into the most recent issue(s) I had on other projects. If you have a hammer, everything looks like a nail after all ;-) . E.g. it could have saved me some time if I hadn't had a look into inlining or type-pollution. While I didn't believe they were the real issues in the first place, they often were issues in recent similar investigations where the picture was not obvious enough from the start...Ruling out ideas was as important and providing contextual information via different tools and then combining it to a bigger picture was once again the key to success. At least success in the sense of finding a performance difference. If that's the result you wanted is not for me to decide. But I'm sharing this a bit more in-depth in the hope of sharing some knowledge on some tools and techniques when investigating things like that and hope I could therefore make up for the maybe unsatisfying results ;-) Cheers, |
@dreis2211 Kudos!! You're explanation is quite satisfying to me. I'm curious to understand while the indexer is not breaking and if this misbehavior can somehow "suggest" some improvement also on the AOT side, but if there is no gain in startup while acting correctly I think we are done on the "Indexer vs AOT" question. PS: I don't know if you had the chance to observe some improvement on the "AOT vs Standard" mode while using a single module application, since in my multi-module example there is no much gain using AOT, or better, there is this oscillation between being a a few second faster or few second slower. Honestly it can depend also on a variable load on my machine while testing it, so just know if you experienced the same or something else. Thanks again for your time, |
It's just starting the app with an empty persistence unit and no Spring Data repositories. I haven't had the time to investigate but the way the index is created in your sample looks wrong to me. If you use the app to do something with the persistence unit, you'll see it breaking with the indexer.
There's no gain because the setup is wrong as we've just described. Chris already answered that:
The index "only" replaces classpath scanning, which is a tiny bit of what the |
@snicoll With "standard" I mean "no use of AOT, nor indexer", just starting the app the standard way. It's clear to me that the setup was wrong in the indexer case, and that explains why the Indexer appears to be faster since it's skipping scans of entities and persistence stuff. Still, it didn't explain why I had small gains or worse results vs the "standard" "not compiled nor indexed" mode, in which the setup should work correctly (and if not in the example app, I had equal results in a production application that is fully working since months). I'll maybe try to play with the tools pointed out by Chris and do some profiling on my own, thanks again for your patience and the effort put into this. |
Then it means that Spring is not the limiting factor and something else is taking time in startup time. We've already seen the effect of starting Hibernate with 0 entities or with the expected entities. I am going to close this as the issue is no longer actionable. |
To clarify the above point. I basically moved everything into one module. These are the very rough timings after restructuring the project on my machine. AOT "Normal" (no index, no AOT) Indexed As the index is also fairly small, there is little gain over the "normal" example (if at all). AOT is indeed faster than all the other variants after the project does the same for all 3 comparisons (a.k.a correctly processing Hibernate metadata and registering managed types). Keep in mind that there are still lots of classes being loaded on startup (10k or more) and read+lseek still being executed. And it connects to a database. This is (network) I/O. Depending on your machine this can be slower or faster. As @snicoll outlined I think there's no real actionable item left here. I do think that there's still value in keeping an eye on possible performance improvements, but the given example unfortunately doesn't suffice for that. For this particular example AOT is indeed the fastest option. |
As reported by @shodo in #30431
We will investigate this sample application to find out why in this case AOT is not faster than the context-indexer.
The text was updated successfully, but these errors were encountered: