Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JTReg test fail - java/nio/Buffer/DirectBufferAllocTest.java #4473

Open
ben-walsh opened this issue Jan 28, 2019 · 17 comments
Open

JTReg test fail - java/nio/Buffer/DirectBufferAllocTest.java #4473

ben-walsh opened this issue Jan 28, 2019 · 17 comments

Comments

@ben-walsh
Copy link
Contributor

Regression observed with the 11.0.2 RC1 build ...

Test https://github.com/ibmruntimes/openj9-openjdk-jdk11/blob/openj9/test/jdk/java/nio/Buffer/DirectBufferAllocTest.java is failing intermittently with the following ...

STDOUT:
Allocating direct ByteBuffers with capacity 1048576 bytes, using 8 threads for 5 seconds...
STDERR:
java.lang.RuntimeException: Errors encountered!
	at DirectBufferAllocTest.main(DirectBufferAllocTest.java:157)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
	at java.base/java.lang.Thread.run(Thread.java:825)
	Suppressed: java.lang.OutOfMemoryError: Direct buffer memory
		at java.base/java.nio.Bits.reserveMemory(Bits.java:175)
		at java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
		at java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317)
		at DirectBufferAllocTest.lambda$main$0(DirectBufferAllocTest.java:126)
		at DirectBufferAllocTest$$Lambda$16.00000000F2D5FD80.call(Unknown Source)
		at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
		at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
		at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
		... 1 more

A 100x run showed a 20% failure rate - https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/915.

A 100x run with -Xint showed a 23% failure rate - https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/919.

I have kicked off a further 100x run with -Xint at https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/923.

This intermittently failure has not been observed in the nightly Windows builds. To double check, I have kicked off a 100x run with -Xint at https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/922.

This intermittently failure has not been observed on any non-Windows platform.

@pshipton
Copy link
Member

Not sure this is a release blocker, but I have added it to the 0.12 milestone for now.
@ben-walsh does this test pass reliably on the 0.11 release build? i.e. is it really a regression?

@DanHeidinga

@pdbain-ibm can you please take a look.

@pdbain-ibm
Copy link
Contributor

Will do.

@ben-walsh
Copy link
Contributor Author

ben-walsh commented Jan 28, 2019

@pshipton

I meant regression compared to the nightly builds. I have kicked off https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/927 to get a data point to compare against previous release.

Having looked with Andrew L at what added the relevant code to Bits.java (https://bugs.openjdk.java.net/browse/JDK-6857566), the slower the test machine, the more likely the OOM will get triggered during the test. Given the likely variably of CPU resource allocated to the test machine during a 100x run - I suspect it is just the case that, on top of that, Windows is slower than the other OSes. Unless there is something particularly less efficient in our memory allocation on Windows compared to other OSes.

I have also kicked off https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/924 to compare against Linux with same build level.

It is probably worth re-prioritising this issue once those results are in.

UPDATE : Also kicked off https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/928 to compare against last nightly.

@pdbain-ibm
Copy link
Contributor

@ben-walsh could you please also test with -Xgcpolicy:optthruput --verbose:gc?

@dmitripivkine would you kindly review this?

Looking at the log file, we see: run main/othervm -XX:MaxDirectMemorySize=128m DirectBufferAllocTest, Allocating direct ByteBuffers with capacity 1048576 bytes, using 8 threads for 5 seconds. We will exhaust the pool after 128 allocations, or 16 allocations per thread.

@dmitripivkine
Copy link
Contributor

DBB uses object Finalization to release memory (needs to run sun.misc.Cleaner). If Finalizer thread(s) can not be executed fast/often enough the Native OOM possible

@ben-walsh
Copy link
Contributor Author

ben-walsh commented Jan 28, 2019

Kicked off https://ci.adoptopenjdk.net/view/work%20in%20progress/job/SXA-BenWalsh-Grinder/19 with the requested -Xgcpolicy:optthruput -verbose:gc

@pshipton
Copy link
Member

As this is not a new failure, I've removed it from the 0.12 milestone plan.

@ben-walsh
Copy link
Contributor Author

https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/927/consoleText - 100x -Xint against last release build shows a 37% failure rate.

https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/924/consoleText - 100x -Xint against RC1 build on Linux shows a 0% failure rate.

https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/928/consoleText - 100x -Xint against latest nightly - 30% failure rate but UNINTERESTING RESULT - due to adoptium/temurin-build#869, it wrongly tested against RC1 build again.

@ben-walsh
Copy link
Contributor Author

Kicked off https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/933 to force 100x -Xint test against latest nightly.

@ben-walsh
Copy link
Contributor Author

https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/933/consoleText - 100x -Xint against latest nightly - 25% failure rate.

@ben-walsh
Copy link
Contributor Author

@ben-walsh ben-walsh changed the title JTReg test fail - Windows : java/nio/Buffer/DirectBufferAllocTest.java JTReg test fail - Windows & MacOS : java/nio/Buffer/DirectBufferAllocTest.java Feb 4, 2019
@pshipton pshipton added comp:vm and removed comp:vm labels Feb 4, 2019
@ben-walsh
Copy link
Contributor Author

@ben-walsh
Copy link
Contributor Author

Adjusting title to reflect platform breadth.

@ben-walsh ben-walsh changed the title JTReg test fail - Windows & MacOS : java/nio/Buffer/DirectBufferAllocTest.java JTReg test fail - java/nio/Buffer/DirectBufferAllocTest.java Feb 6, 2019
@adamfarley
Copy link
Contributor

@pshipton
Copy link
Member

@adamfarley your link is for a Hotspot build. If the problems also occurs with HS, seems like a problem with the test itself.

@adamfarley
Copy link
Contributor

adamfarley commented Jun 30, 2020

My mistake. Will open an openjdk-tests issue.

If the fix for that issue also results in 0% failures on j9, I'll come back and close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants