Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUSTCOM-76: Fix Thread synchronization in WebappClassloader #4378

Merged
merged 2 commits into from
Jan 8, 2020
Merged

CUSTCOM-76: Fix Thread synchronization in WebappClassloader #4378

merged 2 commits into from
Jan 8, 2020

Conversation

rdebusscher
Copy link

@rdebusscher rdebusscher commented Dec 12, 2019

Description

This is a bug fix

Customer reports occasionally a NullPointerException in org.glassfish.web.loader.WebappClassLoader when starting up his Payara Micro (5.193.1) Application.

Caused by: java.lang.NullPointerException
at org.glassfish.web.loader.WebappClassLoader.findResourceInternalFromJars(WebappClassLoader.java:2908)
at org.glassfish.web.loader.WebappClassLoader.findResourceInternal(WebappClassLoader.java:2761)
at org.glassfish.web.loader.WebappClassLoader.findClassInternal(WebappClassLoader.java:2651)

There is a synchronized access on a non final property causing multithreading issues in the addJar() and findResourceInternalFromJars()

Testing

New tests

Testing Performed

  • Manual smoke testing with a few customer reperoducers to verify normal classloader behaviour

Test suites executed

  • Payara Samples

Testing Environment

Oracle 8u181, mvn 3.5.4

Documentation

N/A

Notes for Reviewers

Also added {} to if statements if missing.

Update 16/12

I used to the (already present) jarFilesLock object within the findResourceInternalFromJars method since that one was already protecting the access to property jarFiles.
This means that accessing the jar files is now synchronized with start up but also with shutdown.

import org.apache.naming.resources.WebDirContext;
import org.apache.naming.resources.Resource;
import org.apache.naming.resources.ResourceAttributes;
import org.apache.naming.resources.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would stay with a list of classes if it is less than 11.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's up for debate - personally I hate using * unless I'm literally using every single class :P

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my experience is that if you use more than 10 classes from the same package in imports, usually the new class asks to be refactored, so finally we end up in the same nice state :)

@@ -837,56 +838,58 @@ public synchronized void addJar(String jar, JarFile jarFile, File file)
logger.log(Level.FINER, "addJar({0})", jar);
}

// See IT 11417
super.addURL(getURL(file));
synchronized (jarFilesPropertyLock) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method is already synchronized, how could this help? Two locks can cause deadlock ... but ... in previous version jarFiles are used also as a lock, so that's the same ...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the method addJar() is synchronized, but other parts of the class requires synchronized access to the property jarFiles. But the synchronization on the method does not synchronize on the property.

In this class, there are synchronization block on 6 different items.

@@ -2711,7 +2729,7 @@ protected ResourceEntry findResourceInternal(String name, String path) {
entry = findResourceInternalFromRepositories(name, path);

if (entry == null) {
synchronized (jarFiles) {
Copy link
Contributor

@dmatej dmatej Dec 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't be better to extract the synchronized block to a synchronized method, so it would use the same "default" lock?
Or all accesses to the final field should use own but same lock as you implemented, but I would be afraid of deadlocks ... or use Concurrent implementation if it would make sense (I don't think it does here).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to change the synchronization fundamentally since there seems a lot of synchronization logic in the code. And changing those things means that we probably break the server functionality.

@pdudits
Copy link
Contributor

pdudits commented Dec 13, 2019

Were you able to confirm that the race condition is between findResourceInternalFromJars and addJar?

My hypothesis would be that it is race condition between findResourceInternalFromJars and closeJars.

findResourceInternalFromJars confirms that jars are still open by invoking openJars:

openJars synchronizes on jarFilesLock:

But before findResourceInternalFromJars proceeds to accessing the jars (or while it is doing it), closeJars get called, and clears the array:

It synchronizes on same lock liken openJars, but that method exited a while ago. So if something needs synchronization, it might be findResourceInternalFromJars itself.

@rdebusscher
Copy link
Author

@pdudits closeJars is also a possibility, but unlikely since issue occurs during startup of application within a @singleton @startup @PostConstruct method. So runtime is not stopping.

But I'll add it since it can lead to some issues in the future.

@pdudits
Copy link
Contributor

pdudits commented Dec 16, 2019

@pdudits closeJars is also a possibility, but unlikely since issue occurs during startup of application within a @singleton @startup @PostConstruct method. So runtime is not stopping.

But closeJars is not only called on stopping. Quite the opposite, it's called at the end of startup to fight windows' habit of sticking to file references and preventing their deletion:

https://github.com/rdebusscher/Payara/blob/b1e5172/appserver/web/web-core/src/main/java/org/apache/catalina/core/StandardContext.java#L5788

@rdebusscher
Copy link
Author

I used to the (already present) jarFilesLock object within the findResourceInternalFromJars method since that one was already protecting the access to property jarFiles.
This means that accessing the jar files is now synchronized with start up but also with shutdown.

Copy link
Contributor

@pdudits pdudits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking with "Hide whitespace changes" turned on, the change looks good.

But this one passage from indented part really bothers me:

Comment on lines +860 to +865
String[] result = new String[paths.length + 1];
for (int i = 0; i < paths.length; i++) {
result[i] = paths[i];
}
result[paths.length] = jar;
paths = result;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 4 idioms for appending arrays are quite disturbing here.

The design might stem from pre-1.4 era, where there was nothing similar to CopyOnWriteArrayList<PathEntry>, and is instead simulated with multiple arrays. But these should at least be refactored into form paths = appendArray(paths, entry). And internally use Arrays.copyOf(arr, arr.length+1).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, but the goal of this PR is not to update some already existing code constructs.

@rdebusscher
Copy link
Author

But closeJars is not only called on stopping. Quite the opposite, it's called at the end of startup to fight windows' habit of sticking to file references

But jars are reopened immediately again to have the classloader working. So not an efficient part of this old code.

@rdebusscher
Copy link
Author

Customer verified custom build with this changes and reports issue is fixed. Good to be merged in master

@Pandrex247
Copy link
Member

Jenkins test please

@Cousjava Cousjava merged commit ce2fec4 into payara:master Jan 8, 2020
@rdebusscher rdebusscher deleted the CUSTCOM-76 branch January 20, 2020 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants