Parallelize bean scanning in SmallRyeFaultToleranceProcessor #45509

jin-harmoney · 2025-01-10T17:56:37Z

Description

Current behavior:

Beans are scanned sequentially here: https://github.com/quarkusio/quarkus/blob/main/extensions/smallrye-fault-tolerance/deployment/src/main/java/io/quarkus/smallrye/faulttolerance/deployment/SmallRyeFaultToleranceProcessor.java#L234

According to the build metrics, this build step takes about 50% of the total build time. We have 1500 beans in our application.

Desired behavior:

Is there a way to perform bean scanning in parallel?

I'd be happy to contribute if someone could point me in the right direction. I checked other build steps, but I couldn't find any parallel behavior, so I'm not sure where to start.

Implementation ideas

No response

quarkus-bot · 2025-01-10T17:56:40Z

/cc @Ladicek (smallrye), @jmartisk (smallrye), @phillip-kruger (smallrye), @radcortez (smallrye)

gsmet · 2025-01-10T18:26:39Z

Could you try getting us some profiling data following the instructions in https://github.com/quarkusio/quarkus/blob/main/TROUBLESHOOTING.md#profiling-application-startup-with-async-profiler ?

Don't hesitate to ping me here or on Zulip if you need help.

jin-harmoney · 2025-01-10T19:21:03Z

@gsmet sure! Is this sufficiently detailed or are you searching for something specific?

gsmet · 2025-01-10T21:13:37Z

I think it would be interesting to see what's going on in createMethodDescriptor.

Do you have a lot of methods annotated with fault tolerance annotations?

gsmet · 2025-01-10T21:17:53Z

My wild guess is that it's going to be the getClassProxy() calls.

gsmet · 2025-01-10T22:14:32Z

Could you try this small patch: #45512 ?

I wonder if it could help.

There are instructions on how to build Quarkus here: https://github.com/quarkusio/quarkus/blob/main/CONTRIBUTING.md#building-main (but you need my PR code instead) and also instructions on how to update the version in the following paragraph.

That being said, in your profiling, it doesn't look like it's Fault Tolerance taking the most time?

Let me know how it goes.

gsmet · 2025-01-10T22:16:20Z

BTW, your startup looks interesting so I would really be interested in the full HTML output of Async Profiler so that we can have a look at the full details (see instructions in my first comment above).

Ladicek · 2025-01-13T08:02:02Z

Agree with @gsmet that getClassProxy() is probably the culprit here (I could also remove streams from createMethodDescriptor() just to avoid that cost, but the main thing it does is a bunch of getClassProxy() calls), so it would be good to test with his patch first.

Other than that, it should be relatively straightforward to parallelize FT scanning. Just not sure if that's necessary at the moment.

gsmet · 2025-01-13T08:39:18Z

@Ladicek it might also be a good idea to drop the DotName creation in FaultToleranceScanner#getAnnotation(). I suspect we are creating a lot of them in this case for the very same annotations each time.

gsmet · 2025-01-13T08:40:00Z

But TBH, I'm very very curious of what's going on in the two BeanProcessor calls :).

Ladicek · 2025-01-13T08:50:09Z

Good point @gsmet, we already have those DotNames pre-created as constants in io.quarkus.smallrye.faulttolerance.deployment.DotNames.

jin-harmoney · 2025-01-13T10:14:54Z

FYI I'm working on extracting the info that was requested. Will reply here in a bit.

jin-harmoney · 2025-01-13T10:44:36Z

@gsmet :

Do you have a lot of methods annotated with fault tolerance annotations?

We're only using faulttolerance @Asynchronous at this moment and we only have 7 methods annotated with @Asynchronous.
That being said, we do have 1500 beans in the application (it's a single WAR monolith, we're migrating from Wildfly to Quarkus)

That being said, in your profiling, it doesn't look like it's Fault Tolerance taking the most time?

Correct, according to the build metrics in the UI, Fault Tolerance takes around 50% of the time. That is, if you compare the total time (around 10sec) with the time that FaultTolerance takes (around 5sec)
The reason we don't see the same thing reflected in profiling, is because profiling does not take the parallel behavior of the build steps into account. So FaultTolerance might only take 10% of the total 'CPU time', it does take 50% of the build time, due to the sequential behavior

I think it would be interesting to see what's going on in createMethodDescriptor.

I'm not used to working with AsyncProfiler, so I'll first share the details from JProfiler and try the patch. Afterwards, I'll provide the details from AsyncProfiler.

JProfiler export & screenshots:

Call_tree.zip

I'll now try with the patch and report back afterward.

jin-harmoney · 2025-01-13T13:58:51Z

@gsmet

Results with your patch:

The FaultTolerance step is no longer significantly slower than the other build steps:

Thank you all for the prompt responses & actions!

Let me know if I can help with smth else. Or if there are other tips & tricks I could check out to speed up the live reload :)

gsmet · 2025-01-13T14:07:54Z

Nice! I'm still wondering why you would get it called so many times if you have so few methods annotated with FT annotations.
Are the annotations present on a parent class common to a lot of your beans?

I would very much be interested in you getting me a picture of what's going on in the two BeanProcessor entries that are at the top and in ClassTransformingBuildStep. If you can develop the big culprits in them one by one so that we can understand what's going on there. Or maybe there's a way you can export the profiling info if they don't contain sensitive data (we can discuss in private at gsmet at redhat dot com)?

I'm not sure we will be able to improve things but you are pushing things to the limit and it's an interesting use case to collect data as an extreme use case sometimes underlines some issues we can fix.

gsmet · 2025-01-13T14:44:30Z

I had a look at the call tree zip above. I think it would be interesting to have a closer look at the details in PanacheEntityClassAccessorGenerationVisitor. I can't click on the call tree.

Do you have a lot of Panache entities? If you're coming from WildFly, I would suspect you're not using Panache yet?

I'm very interested in having the details of the profiling for this one (together with the BeanProcessor ones if they are still a thing) but I'm also wondering if you could drop Panache entirely if not using it?

Relates to quarkusio#45509

jin-harmoney · 2025-01-13T15:48:47Z

The JProfiler html export is indeed a static file it seems. So I can expand the full tree, but that makes the html very unreadable. I'll see if I can get async profiler up and running.

Are the annotations present on a parent class common to a lot of your beans?

7 methods are annotated with @Asynchronous
Of those 7, there is 1 method in an abstract class. That abstract class has 87 implementations.
So in total: 93 bean methods annotated with @Asynchronous

Do you have a lot of Panache entities? If you're coming from WildFly, I would suspect you're not using Panache yet?

Correct, we don't use panache. We don't have the quarkus-hibernate-orm-panache dependency in our pom.

PanacheEntityClassAccessorGenerationVisitor:

BeanProcessor

Case 1:

This one is interesting (String.format taking quite some time, so I assume a lot of invocations):

Could be caused by this find&replace that I still need to do:

Case 2:

Case 3:

Also an interesting one I think:

gsmet · 2025-01-13T15:59:10Z

Hmmm, it's a shame we don't have more details about what's going on in EnhancerImpl.

gsmet · 2025-01-13T16:18:10Z

As for the rest:

BeanProcessor

Case 1: yeah I think this should go away once you fixed the warnings, which is good news.
Case 2: looks like expected
Case 3: yeah it's indeed interesting. My guess is that you end up with:

        // - iterate over dependencyMap entries and process beans for which all dependencies were already processed
        // - when a bean is processed the map entry is removed
        // - if we're stuck and the map is not empty, we found a circular dependency (and throw an ISE)
        Predicate<BeanInfo> isNotDependencyPredicate = new Predicate<BeanInfo>() {
            @Override
            public boolean test(BeanInfo b) {
                return !isDependency(b, dependencyMap);
            }
        };
        Predicate<BeanInfo> isNormalScopedOrNotDependencyPredicate = new Predicate<BeanInfo>() {
            @Override
            public boolean test(BeanInfo b) {
                return b.getScope().isNormal() || !isDependency(b, dependencyMap);
            }
        };
        Predicate<BeanInfo> isNotProducerOrNormalScopedOrNotDependencyPredicate = new Predicate<BeanInfo>() {
            @Override
            public boolean test(BeanInfo b) {
                // Try to process non-producer beans first, including declaring beans of producers
                if (b.isProducer()) {
                    return false;
                }
                return b.getScope().isNormal() || !isDependency(b, dependencyMap);
            }
        };

        boolean stuck = false;
        while (!dependencyMap.isEmpty()) {
            if (stuck) {
                throw circularDependenciesNotSupportedException(dependencyMap);
            }
            stuck = true;
            // First try to process beans that are not dependencies
            stuck = addBeans(beanAdder, dependencyMap, processed, isNotDependencyPredicate);
            if (stuck) {
                // It seems we're stuck but we can try to process normal scoped beans that can prevent a circular dependency
                stuck = addBeans(beanAdder, dependencyMap, processed, isNotProducerOrNormalScopedOrNotDependencyPredicate);
                if (stuck) {
                    stuck = addBeans(beanAdder, dependencyMap, processed, isNormalScopedOrNotDependencyPredicate);
                }
            }
        }

being quite heavy when you have a looooooot of beans and doing a lot of contains(). I wonder if it would be better with a the bean injection map being Map<BeanInfo, Set<BeanInfo>> with a HashSet but:

I'm not entirely sure it would be a better compromise (you would have to calculate the hashCode)
OK, that's probably a gazillion of BeanInfo.equals() but in the end 100 ms compared to what you have is relatively negligible.

I will ping @mkouba @manovotn and @Ladicek in case they want to poke in this area.

gsmet · 2025-01-13T16:20:01Z

I think you should probably focus on getting more info about the Hibernate ORM enhancer part. I'm not sure how easy it would be to improve things but I think it's worth trying to understand what's going on and if we can improve.

/cc @yrodiere @mbladel @marko-bekhta

jin-harmoney · 2025-01-13T16:34:13Z

I would indeed be interested to disable the panache enhancements since we don't use that.

Details about the Hibernate part (Jprofiler doesn't instrument org.hibernate by default):

BeanProcessor case 1 is indeed resolved after changing to package-private.

yrodiere · 2025-01-13T16:35:21Z

I think you should probably focus on getting more info about the Hibernate ORM enhancer part. I'm not sure how easy it would be to improve things but I think it's worth trying to understand what's going on and if we can improve.

/cc @yrodiere @mbladel @marko-bekhta

@Sanne worked specifically on improving enhancing large models in 2024, and made very significant improvements -- at the cost of increased complexity (I don't follow what's going on, personally :) ). Which version of Quarkus are we talking about?

I would indeed be interested to disable the panache enhancements since we don't use that.

This class is about all enhancement except the Panache ones. It's absolutely needed, unfortunately.

jin-harmoney · 2025-01-13T16:46:15Z

We're currently on 3.16.3 (looks like I need to upgrade), but my last screenshots are using a local build from the patch you provided: #45512.

FYI we have a single persistence unit with 159 entities.

yrodiere · 2025-01-13T16:53:46Z

We're currently on 3.16.3 (looks like I need to upgrade), but my last screenshots are using a local build from the patch you provided: #45512.

FYI we have a single persistence unit with 159 entities.

Okay, then you should be benefiting from the performance improvements already, added in 3.14: #40329

It's odd though, those improvements brought enhancement down to ~1s for a (presumably much larger) test model, so 159 entity types shouldn't be that slow... Though knowing Sanne, his computer is probably very, very powerful and fine-tuned, so there's that :]

gsmet · 2025-01-13T17:04:58Z

TBH I'm not sure if it's the profiling making things slow but the numbers are quite high, compared to what I would expect.

jin-harmoney · 2025-01-13T18:02:01Z

I don't see a significant difference between live-reload times with or without the profiler attached. Locally it's around 7sec. In my k8s cluster (using k3s), it's around 10sec. I don't have any memory/cpu limits, neither on Java level, nor on k8s level.

My laptop specs:

Sanne · 2025-01-13T20:44:03Z

It's odd though, those improvements brought enhancement down to ~1s for a (presumably much larger) test model, so 159 entity types shouldn't be that slow...

It's been a while so I don't remember all details, but at high level: there are two different phases in which the Hibernate enhancer is applied: one for regular entities getting enhanced, and one for enhanced proxies being generated.

Please bear in mind that the model that I had been optimizing (the one which should now take about 1s) was a model which had a lot of regular entities but very few cases of enhanced proxies; my optimisations had been applied for the first case but there's still room for improvement on the enhanced proxies case. I'd hope to return to that eventually, but I don't have an "interesting" user report or a benchmark to focus on; perhaps we should generate a syntetic one.

jin-harmoney added the kind/enhancement New feature or request label Jan 10, 2025

quarkus-bot bot added the area/smallrye label Jan 10, 2025

gsmet mentioned this issue Jan 13, 2025

Keep track of generated class proxies in BytecodeRecorderImpl to avoid creating several for the same class #45512

Merged

gsmet added a commit to gsmet/quarkus that referenced this issue Jan 13, 2025

Avoid some useless allocations in FaultToleranceScanner

ab4f04c

Relates to quarkusio#45509

gsmet mentioned this issue Jan 13, 2025

Avoid some useless allocations in FaultToleranceScanner #45539

Closed

gsmet closed this as completed in #45512 Jan 13, 2025

quarkus-bot bot added this to the 3.18 - main milestone Jan 13, 2025

gsmet modified the milestones: 3.18 - main, 3.17.7 Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize bean scanning in SmallRyeFaultToleranceProcessor #45509

Parallelize bean scanning in SmallRyeFaultToleranceProcessor #45509

jin-harmoney commented Jan 10, 2025 •

edited

Loading

quarkus-bot bot commented Jan 10, 2025

gsmet commented Jan 10, 2025

jin-harmoney commented Jan 10, 2025

gsmet commented Jan 10, 2025

gsmet commented Jan 10, 2025

gsmet commented Jan 10, 2025

gsmet commented Jan 10, 2025

Ladicek commented Jan 13, 2025

gsmet commented Jan 13, 2025

gsmet commented Jan 13, 2025

Ladicek commented Jan 13, 2025

jin-harmoney commented Jan 13, 2025

jin-harmoney commented Jan 13, 2025 •

edited

Loading

jin-harmoney commented Jan 13, 2025

gsmet commented Jan 13, 2025

gsmet commented Jan 13, 2025

jin-harmoney commented Jan 13, 2025

gsmet commented Jan 13, 2025

gsmet commented Jan 13, 2025

gsmet commented Jan 13, 2025

jin-harmoney commented Jan 13, 2025

yrodiere commented Jan 13, 2025

jin-harmoney commented Jan 13, 2025 •

edited

Loading

yrodiere commented Jan 13, 2025 •

edited

Loading

gsmet commented Jan 13, 2025

jin-harmoney commented Jan 13, 2025

Sanne commented Jan 13, 2025

Parallelize bean scanning in SmallRyeFaultToleranceProcessor #45509

Parallelize bean scanning in SmallRyeFaultToleranceProcessor #45509

Comments

jin-harmoney commented Jan 10, 2025 • edited Loading

Description

Implementation ideas

quarkus-bot bot commented Jan 10, 2025

gsmet commented Jan 10, 2025

jin-harmoney commented Jan 10, 2025

gsmet commented Jan 10, 2025

gsmet commented Jan 10, 2025

gsmet commented Jan 10, 2025

gsmet commented Jan 10, 2025

Ladicek commented Jan 13, 2025

gsmet commented Jan 13, 2025

gsmet commented Jan 13, 2025

Ladicek commented Jan 13, 2025

jin-harmoney commented Jan 13, 2025

jin-harmoney commented Jan 13, 2025 • edited Loading

jin-harmoney commented Jan 13, 2025

gsmet commented Jan 13, 2025

gsmet commented Jan 13, 2025

jin-harmoney commented Jan 13, 2025

gsmet commented Jan 13, 2025

gsmet commented Jan 13, 2025

gsmet commented Jan 13, 2025

jin-harmoney commented Jan 13, 2025

yrodiere commented Jan 13, 2025

jin-harmoney commented Jan 13, 2025 • edited Loading

yrodiere commented Jan 13, 2025 • edited Loading

gsmet commented Jan 13, 2025

jin-harmoney commented Jan 13, 2025

Sanne commented Jan 13, 2025

jin-harmoney commented Jan 10, 2025 •

edited

Loading

jin-harmoney commented Jan 13, 2025 •

edited

Loading

jin-harmoney commented Jan 13, 2025 •

edited

Loading

yrodiere commented Jan 13, 2025 •

edited

Loading