Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUSTCOM-76 WebappClassLoader Synchronisation Test #4430

Merged
merged 2 commits into from
Jan 15, 2020
Merged

CUSTCOM-76 WebappClassLoader Synchronisation Test #4430

merged 2 commits into from
Jan 15, 2020

Conversation

MattGill98
Copy link
Contributor

Description

Synchronisation issues in the WebappClassLoader have already been fixed,
but not tested properly. This commit adds a test for the synchronisation
issues that caused CUSTCOM-76.

Testing

Testing Performed

Ran the unit test with -Dsurefire.rerunFailingTestsCount=500 before and after the changes in #4378.

Synchronisation issues in the WebappClassLoader have already been fixed,
but not tested properly. This commit adds a test for the synchronisation
issues that caused CUSTCOM-76.
@MattGill98 MattGill98 requested a review from dmatej January 13, 2020 16:41
@MattGill98 MattGill98 self-assigned this Jan 13, 2020
@MattGill98
Copy link
Contributor Author

jenkins test please

@dmatej
Copy link
Contributor

dmatej commented Jan 13, 2020

Test will pass if all tasks complete without exception (-> future not completed -> TimeoutException -> swallowed -> pass)
Test will pass if all tasks end up in deadlock (-> future not completed -> TimeoutException -> swallowed ->pass)

  • is this what you wanted?

Test will fail if any task complete with an exception (any) (-> future completed -> rethrowing the cause)

Nevermind, the test passed even with 600 threads in the pool, 100000 tasks (*3) and 30 second timeout and Thread.yield instead of sleep.

Then I played bit more with that ... see comments in code. I'm still not sure with the target of the test (maybe add javadoc with some human readable description of the purpose).

for (JarEntry entry : Collections.list(jarFile.entries())) {
webappClassLoader.findResource(entry.getName());
// System.out.println("Looked up " + resourceEntry);
Thread.sleep(0, 100);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes slowing down of all lookup tasks. When I tried to run much more threads to cause deadlock, I replaced this with Thread.yield(). But it is not generally better, it depends on what we hunt.

Copy link

@rdebusscher rdebusscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running the test on my system suggests that there is a deadlock.

The test could pass without any threads running in time when running on
a slow machine. This commit makes sure that a minimum number of threads
run before completing. A few other changes were made to make the test
more reliable.
executor = Executors.newFixedThreadPool(60);

// Require a minimum number of executions before completing
latch = new CountDownLatch(EXECUTION_COUNT);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means we just wait fore the first 100 tasks of the 300 tasks to be finished. With the current code (Thread.sleep(0, 100); in the lookup() method) this means only the 'add' and 'closeJar' tasks are finished. No 'lookup' task is finished by the time the test checks for an error.

public void shutdown() throws InterruptedException {
if (executor != null) {
executor.shutdownNow();
assertTrue("Executor could not shutdown. This could mean there is a deadlock.",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

executor.awaitTermination termination always return true, also in the case the tasks isn't finished. In that case, the executor send an interrupted signal and InterruptedException occurs (verified with try catch in lookup method).

}

// Wait for tasks to execute
assertTrue("The tasks didn't finish in the allowed time.",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It waits only for 100 tasks of the 300 submitted. Since 'add' and 'closeJar' are small/fast tasks, this will never be an issue (except in case of deadlock)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is there exclusively for deadlocks

for (JarEntry entry : Collections.list(jarFile.entries())) {
webappClassLoader.findResource(entry.getName());
// System.out.println("Looked up " + resourceEntry);
Thread.sleep(0, 100);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sleep means that only a 'few' lookups are performed (in my test, around 3000 in the 130 ms the test is running).

By replacing it with Thread.yield(), the number of lookup actions is increased to around 21000 which mean the chance of having a concurrency issue (which we like to detect with this test) is increased significantly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thread.yield() causes a huge amount of false positives on the test. I imagine this is because it reduces the total time spent on this task

Copy link
Contributor

@dmatej dmatej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build passed, but I still have little a doubt if with this counts of threads and pool capacity and tasks it can detect lock issues.On current intels and JDKs are synchronizations pretty fast, so from my experience running hundreds or thousands is better (but may take too much time on slower environments (hw/config)).
Also what I tested - too many files in classloader (thousand) slows down the classloading to nearly unusable state (999 threads waiting to call addJar and one working inside). I'm not sure if it is alright, it took also 8GB heap, so maybe there is some close missing ...? (but it is not a problem of this test)
It depends ... I believe you :-)

@rdebusscher rdebusscher self-requested a review January 15, 2020 10:10
Copy link

@rdebusscher rdebusscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had an offline conversation about my comments and current state is better to see differences (with and without the fix in WebappClassloader)

Good for me to be merged.

@MattGill98
Copy link
Contributor Author

jenkins test please

@MattGill98 MattGill98 merged commit d89fb81 into payara:master Jan 15, 2020
MarkWareham pushed a commit to MarkWareham/Payara that referenced this pull request Feb 3, 2020
CUSTCOM-76 WebappClassLoader Synchronisation Test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants