Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize bean scanning in SmallRyeFaultToleranceProcessor #45509

Closed
jin-harmoney opened this issue Jan 10, 2025 · 27 comments · Fixed by #45512
Closed

Parallelize bean scanning in SmallRyeFaultToleranceProcessor #45509

jin-harmoney opened this issue Jan 10, 2025 · 27 comments · Fixed by #45512
Labels
area/smallrye kind/enhancement New feature or request
Milestone

Comments

@jin-harmoney
Copy link

jin-harmoney commented Jan 10, 2025

Description

Current behavior:

Beans are scanned sequentially here: https://github.com/quarkusio/quarkus/blob/main/extensions/smallrye-fault-tolerance/deployment/src/main/java/io/quarkus/smallrye/faulttolerance/deployment/SmallRyeFaultToleranceProcessor.java#L234

According to the build metrics, this build step takes about 50% of the total build time. We have 1500 beans in our application.

Image

Desired behavior:

Is there a way to perform bean scanning in parallel?

I'd be happy to contribute if someone could point me in the right direction. I checked other build steps, but I couldn't find any parallel behavior, so I'm not sure where to start.

Implementation ideas

No response

@jin-harmoney jin-harmoney added the kind/enhancement New feature or request label Jan 10, 2025
Copy link

quarkus-bot bot commented Jan 10, 2025

/cc @Ladicek (smallrye), @jmartisk (smallrye), @phillip-kruger (smallrye), @radcortez (smallrye)

@gsmet
Copy link
Member

gsmet commented Jan 10, 2025

Could you try getting us some profiling data following the instructions in https://github.com/quarkusio/quarkus/blob/main/TROUBLESHOOTING.md#profiling-application-startup-with-async-profiler ?

Don't hesitate to ping me here or on Zulip if you need help.

@jin-harmoney
Copy link
Author

@gsmet sure! Is this sufficiently detailed or are you searching for something specific?

Image

@gsmet
Copy link
Member

gsmet commented Jan 10, 2025

I think it would be interesting to see what's going on in createMethodDescriptor.

Do you have a lot of methods annotated with fault tolerance annotations?

@gsmet
Copy link
Member

gsmet commented Jan 10, 2025

My wild guess is that it's going to be the getClassProxy() calls.

@gsmet
Copy link
Member

gsmet commented Jan 10, 2025

Could you try this small patch: #45512 ?

I wonder if it could help.

There are instructions on how to build Quarkus here: https://github.com/quarkusio/quarkus/blob/main/CONTRIBUTING.md#building-main (but you need my PR code instead) and also instructions on how to update the version in the following paragraph.

That being said, in your profiling, it doesn't look like it's Fault Tolerance taking the most time?

Let me know how it goes.

@gsmet
Copy link
Member

gsmet commented Jan 10, 2025

BTW, your startup looks interesting so I would really be interested in the full HTML output of Async Profiler so that we can have a look at the full details (see instructions in my first comment above).

@Ladicek
Copy link
Contributor

Ladicek commented Jan 13, 2025

Agree with @gsmet that getClassProxy() is probably the culprit here (I could also remove streams from createMethodDescriptor() just to avoid that cost, but the main thing it does is a bunch of getClassProxy() calls), so it would be good to test with his patch first.

Other than that, it should be relatively straightforward to parallelize FT scanning. Just not sure if that's necessary at the moment.

@gsmet
Copy link
Member

gsmet commented Jan 13, 2025

@Ladicek it might also be a good idea to drop the DotName creation in FaultToleranceScanner#getAnnotation(). I suspect we are creating a lot of them in this case for the very same annotations each time.

@gsmet
Copy link
Member

gsmet commented Jan 13, 2025

But TBH, I'm very very curious of what's going on in the two BeanProcessor calls :).

@Ladicek
Copy link
Contributor

Ladicek commented Jan 13, 2025

Good point @gsmet, we already have those DotNames pre-created as constants in io.quarkus.smallrye.faulttolerance.deployment.DotNames.

@jin-harmoney
Copy link
Author

FYI I'm working on extracting the info that was requested. Will reply here in a bit.

@jin-harmoney
Copy link
Author

jin-harmoney commented Jan 13, 2025

@gsmet :

Do you have a lot of methods annotated with fault tolerance annotations?

  • We're only using faulttolerance @Asynchronous at this moment and we only have 7 methods annotated with @Asynchronous.
  • That being said, we do have 1500 beans in the application (it's a single WAR monolith, we're migrating from Wildfly to Quarkus)

That being said, in your profiling, it doesn't look like it's Fault Tolerance taking the most time?

  • Correct, according to the build metrics in the UI, Fault Tolerance takes around 50% of the time. That is, if you compare the total time (around 10sec) with the time that FaultTolerance takes (around 5sec)
  • The reason we don't see the same thing reflected in profiling, is because profiling does not take the parallel behavior of the build steps into account. So FaultTolerance might only take 10% of the total 'CPU time', it does take 50% of the build time, due to the sequential behavior

I think it would be interesting to see what's going on in createMethodDescriptor.

I'm not used to working with AsyncProfiler, so I'll first share the details from JProfiler and try the patch. Afterwards, I'll provide the details from AsyncProfiler.

JProfiler export & screenshots:

Call_tree.zip

Image

Image

I'll now try with the patch and report back afterward.

@jin-harmoney
Copy link
Author

@gsmet

Results with your patch:

The FaultTolerance step is no longer significantly slower than the other build steps:

Image

Image

Thank you all for the prompt responses & actions!

Let me know if I can help with smth else. Or if there are other tips & tricks I could check out to speed up the live reload :)

@gsmet
Copy link
Member

gsmet commented Jan 13, 2025

Nice! I'm still wondering why you would get it called so many times if you have so few methods annotated with FT annotations.
Are the annotations present on a parent class common to a lot of your beans?

I would very much be interested in you getting me a picture of what's going on in the two BeanProcessor entries that are at the top and in ClassTransformingBuildStep. If you can develop the big culprits in them one by one so that we can understand what's going on there. Or maybe there's a way you can export the profiling info if they don't contain sensitive data (we can discuss in private at gsmet at redhat dot com)?

I'm not sure we will be able to improve things but you are pushing things to the limit and it's an interesting use case to collect data as an extreme use case sometimes underlines some issues we can fix.

@gsmet
Copy link
Member

gsmet commented Jan 13, 2025

I had a look at the call tree zip above. I think it would be interesting to have a closer look at the details in PanacheEntityClassAccessorGenerationVisitor. I can't click on the call tree.

Do you have a lot of Panache entities? If you're coming from WildFly, I would suspect you're not using Panache yet?

I'm very interested in having the details of the profiling for this one (together with the BeanProcessor ones if they are still a thing) but I'm also wondering if you could drop Panache entirely if not using it?

@jin-harmoney
Copy link
Author

The JProfiler html export is indeed a static file it seems. So I can expand the full tree, but that makes the html very unreadable. I'll see if I can get async profiler up and running.

Are the annotations present on a parent class common to a lot of your beans?

  • 7 methods are annotated with @Asynchronous
  • Of those 7, there is 1 method in an abstract class. That abstract class has 87 implementations.
  • So in total: 93 bean methods annotated with @Asynchronous

Do you have a lot of Panache entities? If you're coming from WildFly, I would suspect you're not using Panache yet?

  • Correct, we don't use panache. We don't have the quarkus-hibernate-orm-panache dependency in our pom.

PanacheEntityClassAccessorGenerationVisitor:

Image

BeanProcessor

Case 1:

This one is interesting (String.format taking quite some time, so I assume a lot of invocations):

Image

Could be caused by this find&replace that I still need to do:

Image

Case 2:

Image

Case 3:

Also an interesting one I think:

Image

@gsmet
Copy link
Member

gsmet commented Jan 13, 2025

Hmmm, it's a shame we don't have more details about what's going on in EnhancerImpl.

@gsmet
Copy link
Member

gsmet commented Jan 13, 2025

As for the rest:

BeanProcessor

Case 1: yeah I think this should go away once you fixed the warnings, which is good news.
Case 2: looks like expected
Case 3: yeah it's indeed interesting. My guess is that you end up with:

        // - iterate over dependencyMap entries and process beans for which all dependencies were already processed
        // - when a bean is processed the map entry is removed
        // - if we're stuck and the map is not empty, we found a circular dependency (and throw an ISE)
        Predicate<BeanInfo> isNotDependencyPredicate = new Predicate<BeanInfo>() {
            @Override
            public boolean test(BeanInfo b) {
                return !isDependency(b, dependencyMap);
            }
        };
        Predicate<BeanInfo> isNormalScopedOrNotDependencyPredicate = new Predicate<BeanInfo>() {
            @Override
            public boolean test(BeanInfo b) {
                return b.getScope().isNormal() || !isDependency(b, dependencyMap);
            }
        };
        Predicate<BeanInfo> isNotProducerOrNormalScopedOrNotDependencyPredicate = new Predicate<BeanInfo>() {
            @Override
            public boolean test(BeanInfo b) {
                // Try to process non-producer beans first, including declaring beans of producers
                if (b.isProducer()) {
                    return false;
                }
                return b.getScope().isNormal() || !isDependency(b, dependencyMap);
            }
        };

        boolean stuck = false;
        while (!dependencyMap.isEmpty()) {
            if (stuck) {
                throw circularDependenciesNotSupportedException(dependencyMap);
            }
            stuck = true;
            // First try to process beans that are not dependencies
            stuck = addBeans(beanAdder, dependencyMap, processed, isNotDependencyPredicate);
            if (stuck) {
                // It seems we're stuck but we can try to process normal scoped beans that can prevent a circular dependency
                stuck = addBeans(beanAdder, dependencyMap, processed, isNotProducerOrNormalScopedOrNotDependencyPredicate);
                if (stuck) {
                    stuck = addBeans(beanAdder, dependencyMap, processed, isNormalScopedOrNotDependencyPredicate);
                }
            }
        }

being quite heavy when you have a looooooot of beans and doing a lot of contains(). I wonder if it would be better with a the bean injection map being Map<BeanInfo, Set<BeanInfo>> with a HashSet but:

  • I'm not entirely sure it would be a better compromise (you would have to calculate the hashCode)
  • OK, that's probably a gazillion of BeanInfo.equals() but in the end 100 ms compared to what you have is relatively negligible.

I will ping @mkouba @manovotn and @Ladicek in case they want to poke in this area.

@gsmet
Copy link
Member

gsmet commented Jan 13, 2025

I think you should probably focus on getting more info about the Hibernate ORM enhancer part. I'm not sure how easy it would be to improve things but I think it's worth trying to understand what's going on and if we can improve.

/cc @yrodiere @mbladel @marko-bekhta

@jin-harmoney
Copy link
Author

I would indeed be interested to disable the panache enhancements since we don't use that.

Details about the Hibernate part (Jprofiler doesn't instrument org.hibernate by default):

Image

BeanProcessor case 1 is indeed resolved after changing to package-private.

@yrodiere
Copy link
Member

I think you should probably focus on getting more info about the Hibernate ORM enhancer part. I'm not sure how easy it would be to improve things but I think it's worth trying to understand what's going on and if we can improve.

/cc @yrodiere @mbladel @marko-bekhta

@Sanne worked specifically on improving enhancing large models in 2024, and made very significant improvements -- at the cost of increased complexity (I don't follow what's going on, personally :) ). Which version of Quarkus are we talking about?

I would indeed be interested to disable the panache enhancements since we don't use that.

This class is about all enhancement except the Panache ones. It's absolutely needed, unfortunately.

@jin-harmoney
Copy link
Author

jin-harmoney commented Jan 13, 2025

We're currently on 3.16.3 (looks like I need to upgrade), but my last screenshots are using a local build from the patch you provided: #45512.

FYI we have a single persistence unit with 159 entities.

@yrodiere
Copy link
Member

yrodiere commented Jan 13, 2025

We're currently on 3.16.3 (looks like I need to upgrade), but my last screenshots are using a local build from the patch you provided: #45512.

FYI we have a single persistence unit with 159 entities.

Okay, then you should be benefiting from the performance improvements already, added in 3.14: #40329

It's odd though, those improvements brought enhancement down to ~1s for a (presumably much larger) test model, so 159 entity types shouldn't be that slow... Though knowing Sanne, his computer is probably very, very powerful and fine-tuned, so there's that :]

@gsmet
Copy link
Member

gsmet commented Jan 13, 2025

TBH I'm not sure if it's the profiling making things slow but the numbers are quite high, compared to what I would expect.

@jin-harmoney
Copy link
Author

I don't see a significant difference between live-reload times with or without the profiler attached. Locally it's around 7sec. In my k8s cluster (using k3s), it's around 10sec. I don't have any memory/cpu limits, neither on Java level, nor on k8s level.

My laptop specs:

Image

@quarkus-bot quarkus-bot bot added this to the 3.18 - main milestone Jan 13, 2025
@Sanne
Copy link
Member

Sanne commented Jan 13, 2025

It's odd though, those improvements brought enhancement down to ~1s for a (presumably much larger) test model, so 159 entity types shouldn't be that slow...

It's been a while so I don't remember all details, but at high level: there are two different phases in which the Hibernate enhancer is applied: one for regular entities getting enhanced, and one for enhanced proxies being generated.

Please bear in mind that the model that I had been optimizing (the one which should now take about 1s) was a model which had a lot of regular entities but very few cases of enhanced proxies; my optimisations had been applied for the first case but there's still room for improvement on the enhanced proxies case. I'd hope to return to that eventually, but I don't have an "interesting" user report or a benchmark to focus on; perhaps we should generate a syntetic one.

@gsmet gsmet modified the milestones: 3.18 - main, 3.17.7 Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/smallrye kind/enhancement New feature or request
Projects
None yet
5 participants