Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FISH-5768 Clustered CDI Events not Received if Sender Application Deployed on Receiver Instance #5435

Merged
merged 4 commits into from
Oct 8, 2021

Conversation

Pandrex247
Copy link
Member

@Pandrex247 Pandrex247 commented Oct 4, 2021

Description

Bug Fix.

Clustered CDI Events do not get resolved correctly if both the sender application and receiver application are deployed to the same instance. The cause is our old favourite: class loaders.

When Weld comes to resolve the observer methods for the event, it fails to do a .equals comparison on the event class due to it having been loaded by the sender class loader rather than the receiver.

The fix here partially reverts #5010 - notably the bit in PayaraValueHolder where the error was tracked down to.

Also moves the PayaraValueHolder class out of JCache module because it's not used there.

Important Info

Blockers

None

Testing

New tests

None

Testing Performed

Testing Environment

Windows 10 - JDK 8

Documentation

N/A

Notes for Reviewers

None

When CDI event class has been loaded by multiple classes on receiver, don't switch classloader in PayaraValueHolder#getValue().

Also moves PayaraValueHolder out of JCache module since it isn't used there, and log an info message at fine when resolving
cdi event qualifiers outside of the current classloader

Signed-off-by: Andrew Pielage <[email protected]>
Signed-off-by: Andrew Pielage <[email protected]>
@Pandrex247
Copy link
Member Author

Jenkins test please

@Pandrex247
Copy link
Member Author

Jenkins test please

@Pandrex247 Pandrex247 marked this pull request as ready for review October 5, 2021 11:57
@lprimak
Copy link
Contributor

lprimak commented Oct 6, 2021

I am quite sure this isn't going to work. I put this in originally in response to another bug. However, I am not quite sure which one it is but I distinctly remember the original code was necessary

What that piece of code does is find the appropriate application class loader for the component id of the sender and sets it for the duration of deserialization, which is necessary to avoid ClassNotFoundException

@pdudits
Copy link
Contributor

pdudits commented Oct 6, 2021

What that piece of code does is find the appropriate application class loader for the component id of the sender and sets it for the duration of deserialization, which is necessary to avoid ClassNotFoundException

The problem we faced was that it didn't find appropriate application class loader for receiving application when (another instance of) sending application was also deployed. Receiver's classloader is already set as the thread context classloader earlier in the process and that's the only one that makes sense for reception of clustered event as that's the only one the application can process.

@lprimak
Copy link
Contributor

lprimak commented Oct 6, 2021

I remember the issue... it was spamming the logs with CNFs every time the message was sent.
However, I don't think this was the only issue, there are others if I remember correctly...
This was a piece of code that I was going back-and-forth with many times.

The problem we faced was that it didn't find appropriate application class loader for receiving application when (another instance of) sending application was also deployed. Receiver's classloader is already set as the thread context classloader earlier in the process and that's the only one that makes sense for reception of clustered event as that's the only one the application can process.

I think I see a cleaner solution that won't break things...

  • Try deserializing with existing CL context
  • If it doesn't work (throws a CNF), use existing code to set the sender's class loader and try again
  • only if all fails, print CNF, as in existing code

I believe that will save the original intent and fix the current issue.

WDYT?

@Pandrex247
Copy link
Member Author

I remember the issue... it was spamming the logs with CNFs every time the message was sent.
However, I don't think this was the only issue, there are others if I remember correctly...

Can you think what they might be?

PayaraValueHolder is only used by the clustered event bus, and I'm not getting CNFEs unnecessarily logged.
They're caught here:

                } catch (IOException | ClassNotFoundException ex) {
                    Logger.getLogger(ClusteredCDIEventBusImpl.class.getName())
                            .log(ex.getCause() instanceof IllegalStateException? Level.FINE : Level.INFO,
                                    "Received Event but could not process it", ex);

I'll need to double-check if they're thrown without this change.

The only other case I think you might be referring to is when qualifiers start to get involved, but this change doesn't affect that (error gets logged and ignored):

Logger.getLogger(PayaraClusteredCDIEventImpl.class.getName()).log(Level.INFO, "Problem determining the qualifier type of an Event ignoring", ex);

I originally had a description of the issue in the PR description but removed it because the problem was found to happen even before Hazelcast 4 was introduced (5.2020.7 was the last community release before Hazelcast 4 was introduced if you want to compare yourself).

@lprimak
Copy link
Contributor

lprimak commented Oct 6, 2021

I remember all these issues came up when testing the following:
https://github.com/mikecroft/BakingJavaEE8MicroPi.git
https://github.com/OndrejM/Payara-Micro-Reactive-Example

in different deployment combinations (not comprehensive):

  • Each deployed in one instance
  • Both deployed in one instance
  • Another app (unrelated) deployed on all instances

@lprimak
Copy link
Contributor

lprimak commented Oct 6, 2021

The issue is when CNF is thrown, it's unclear whether the app is not deployed or something else went wrong.
Original code just throws CNF whenever it can't deserialize, which would happen in all instances in the cluster when
just the sender is deployed and sending messages, which is not good.

@Pandrex247
Copy link
Member Author

Pandrex247 commented Oct 7, 2021

I tested the clustered CDI events from Payara-Examples against master and this branch - both throw and catch the CNFE, but my branch actually successfully resolves it within Weld.

That reactive app seems broken on master (also on this branch) - deploying it actually causes the server or micro instance to hang (seems to cause Hazelcast to freak out). I'm going to say that's a separate issue though since it affects both branches.

Interestingly the baking app doesn't seem to throw any CNFEs (on either branch), but the case still remains that on master no observers are resolved for received events vs. this branch where they are.

So I'm not seeing any active case for adding anything to PR, on both cases CNFE exceptions get thrown, but my branch seems to be "more correct", at least from the view that it has the clustered cdi event bus working.
Also bear in mind that Weld isn't throwing a CNFE exception when it fails to resolve an observer - it just fails a .equals comparison and so assumes that the observer isn't for that event.

Signed-off-by: Andrew Pielage <[email protected]>
@Pandrex247 Pandrex247 merged commit a4921fd into payara:master Oct 8, 2021
@Pandrex247 Pandrex247 deleted the FISH-5768-Comm branch October 8, 2021 08:20
JamesHillyard pushed a commit to JamesHillyard/Payara that referenced this pull request Oct 28, 2021
FISH-5768 Clustered CDI Events not Received if Sender Application Deployed on Receiver Instance
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants