Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JTReg Failures: well known file exceptions on macos/hotspot #1274

Open
M-Davies opened this issue Apr 9, 2020 · 28 comments
Open

JTReg Failures: well known file exceptions on macos/hotspot #1274

M-Davies opened this issue Apr 9, 2020 · 28 comments

Comments

@M-Davies
Copy link

M-Davies commented Apr 9, 2020

Describe the bug
On mac machines for 11 and 14 hotspot the following tests are failing due to pretty much the same error

com/sun/jdi/ProcessAttachTest.java => JDK11 & 14
jdk/jfr/jcmd/TestJcmdStartStopDefault.java => JDK11 & 14
com/sun/jdi/ProcessAttachTest.sh => JDK8 only

To Reproduce
Steps to reproduce the behavior (or Grinder rerun link):

  1. Go to https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder
  2. Kick off a grinder with these parameters
  1. See error in console log

Expected behavior
Tests pass

Screenshots
ProcessAttachTest.java

01:55:19  java.io.IOException: well-known file /var/folders/rt/8_by_h8n5g9d5jhp66cf_7d80000gr/T/.java_pid18038 is not secure: file's group should be the current group (which is -1) but the group is 0
01:55:19  	at jdk.jdi/com.sun.tools.jdi.ProcessAttachingConnector.attach(ProcessAttachingConnector.java:106)
01:55:19  	at ProcessAttachTest.tryDebug(ProcessAttachTest.java:107)
01:55:19  	at ProcessAttachTest.runTest(ProcessAttachTest.java:87)
01:55:19  	at ProcessAttachTest.main(ProcessAttachTest.java:66)
01:55:19  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
01:55:19  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
01:55:19  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
01:55:19  	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
01:55:19  	at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
01:55:19  	at java.base/java.lang.Thread.run(Thread.java:834)

TestJcmdStartStopDefault.java

Standard Output: 

Running DCMD 'JFR.start' through 'PidJcmdExecutor'
Executing command '[/Users/jenkins/workspace/Test_openjdk11_hs_sanity.openjdk_x86-64_mac/openjdkbinary/j2sdk-image/Contents/Home/bin/../bin/jcmd, 18106, JFR.start]'
Command returned with exit code 1
---------------- stdout ----------------
18106:

---------------- stderr ----------------
java.io.IOException: well-known file /var/folders/rt/8_by_h8n5g9d5jhp66cf_7d80000gr/T/.java_pid18106 is not secure: file's group should be the current group (which is -1) but the group is 0
	at jdk.attach/sun.tools.attach.VirtualMachineImpl.checkPermissions(Native Method)
	at jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:108)
	at jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
	at jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
	at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:114)
	at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:98)

----------------------------------------
    
Standard Error:

 stdout: [18106:
];
 stderr: [java.io.IOException: well-known file /var/folders/rt/8_by_h8n5g9d5jhp66cf_7d80000gr/T/.java_pid18106 is not secure: file's group should be the current group (which is -1) but the group is 0
	at jdk.attach/sun.tools.attach.VirtualMachineImpl.checkPermissions(Native Method)
	at jdk.attach/sun.tools.attach.VirtualMachineImpl.<init>(VirtualMachineImpl.java:108)
	at jdk.attach/sun.tools.attach.AttachProviderImpl.attachVirtualMachine(AttachProviderImpl.java:58)
	at jdk.attach/com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:207)
	at jdk.jcmd/sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:114)
	at jdk.jcmd/sun.tools.jcmd.JCmd.main(JCmd.java:98)
]
 exitValue = 1

java.lang.RuntimeException: Expected to get exit value of [0]

	at jdk.test.lib.process.OutputAnalyzer.shouldHaveExitValue(OutputAnalyzer.java:396)
	at jdk.jfr.jcmd.JcmdHelper.jcmd(JcmdHelper.java:111)
	at jdk.jfr.jcmd.JcmdHelper.jcmd(JcmdHelper.java:116)
	at jdk.jfr.jcmd.TestJcmdStartStopDefault.main(TestJcmdStartStopDefault.java:50)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127)
	at java.base/java.lang.Thread.run(Thread.java:834)

com/sun/jdi/ProcessAttachTest.sh

19:29:52  Exception in thread "main" java.io.IOException: well-known file /var/folders/rc/_hq7pkh54kb0dk8vvlt34pz00000gr/T/.java_pid88080 is not secure: file's group should be the current group (which is -1) but the group is 0
19:29:52  	at com.sun.tools.jdi.ProcessAttachingConnector.attach(ProcessAttachingConnector.java:104)
19:29:52  	at ProcessAttachDebugger.main(ProcessAttachDebugger.java:68)
19:29:52  Waiting for port file to be written...
19:29:52  Waiting 1 second(s) ...
19:29:52  Waiting 2 second(s) ...
19:29:52  Waiting 3 second(s) ...
19:29:52  Waiting 4 second(s) ...
19:29:52  Waiting 5 second(s) ...
...
Timeout!

Additional context
It appears to be a perm issue on the surface. The test is creating items in the /var/directory which is a sensitive directory to be creating files in. Its possible that this is likely caused by the the debugger that the test creates not having the same perms as the java_pid file.

@M-Davies
Copy link
Author

Recent builds of 11 are intermittently passing. A result of this test failing on all machines except for test-macincloud-macos1010-x64-1/2

@M-Davies
Copy link
Author

@smlambert can you transfer this issue to infrastructure?

@karianna karianna transferred this issue from adoptium/aqa-tests Apr 14, 2020
@karianna karianna added the bug label Apr 14, 2020
@karianna karianna added this to the April 2020 milestone Apr 14, 2020
@gdams
Copy link
Member

gdams commented Apr 14, 2020

This is to do with the fact that Jenkins isn't in a security group:

test-macincloud-macos1014-x64-:

test-macincloud-macos1014-x64-1:~ admin$ dscl . search /Groups GroupMembership jenkins
test-macincloud-macos1014-x64-1:~ $ 

test-macincloud-macos1010-x64-1

test-macincloud-macos1010-x64-1:~ admin$ dscl . search /Groups GroupMembership jenkins
com.apple.sharepoint.group.1		GroupMembership = (
    jenkins
)

@gdams
Copy link
Member

gdams commented Apr 14, 2020

testing with a group added to Jenkins user:

https://ci.adoptopenjdk.net/job/Grinder/2812/

@gdams gdams self-assigned this Apr 14, 2020
@gdams
Copy link
Member

gdams commented Apr 14, 2020

test failed, trying with a reboot first: https://ci.adoptopenjdk.net/job/Grinder/2813/console

@gdams
Copy link
Member

gdams commented Apr 14, 2020

hmm maybe not. TMPDIR is defined on all the machines. On macOS it seems to default to some weird /var/ dir

@gdams
Copy link
Member

gdams commented Apr 14, 2020

trying with _JAVA_OPTIONS: -Djava.io.tmpdir=/tmp

@gdams
Copy link
Member

gdams commented Apr 14, 2020

hmm so the test shows that it picked up the top dir:

Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/tmp

but then runs the test without it:

12:38:37  Command line: [/Users/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/Contents/Home/bin/../bin/java -agentlib:jdwp=transport=dt_socket,server=y,suspend=n -classpath /Users/jenkins/workspace/Grinder/openjdk-tests/TKG/test_output_158686425467/jdk_svc_sanity_0/work/classes/0/com/sun/jdi/ProcessAttachTest.d ProcessAttachTestTarg ]

@smlambert any idea why?

@smlambert
Copy link
Contributor

As per our Slack conversation, test job sets JVM_OPTIONS for each run, you can append additional options by using EXTRA_OPTIONS param (or replace the entire set by using JVM_OPTIONS).

Screen Shot 2020-04-14 at 10 42 27 AM

@gdams
Copy link
Member

gdams commented Apr 15, 2020

using EXTRA_OPTIONS https://ci.adoptopenjdk.net/job/Grinder/2822

trying with EXTRA_OPTIONS=-J-Djava.io.tmpdir=/tmp

https://ci.adoptopenjdk.net/job/Grinder/2823

@smlambert
Copy link
Contributor

Adding this from our DM in Slack for search-ability later:

Command definition for that test:
https://github.com/AdoptOpenJDK/openjdk-tests/blob/master/openjdk/playlist.xml#L721-L729

Command that gets run:
command that gets run:
08:44:08 "/Users/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/Contents/Home/bin/..//bin/java" -Xmx512m -jar "/Users/jenkins/workspace/Grinder/openjdk-tests/TKG/../../jvmtest/openjdk/jtreg/lib/jtreg.jar"
08:44:08 -agentvm -a -ea -esa -v:fail,error,time,nopass -retain:fail,error,.dmp,javacore.,heapdump.,.trc -ignore:quiet -timeoutFactor:8 -xml:verify -concurrency:2 -vmoptions:"-Xmx512m -Djava.io.tmpdir=/tmp"
08:44:08 -w ""/Users/jenkins/workspace/Grinder/openjdk-tests/TKG/../TKG/test_output_15869546463959/jdk_svc_sanity_0"/work"
08:44:08 -r "/Users/jenkins/workspace/Grinder/openjdk-tests/TKG/../../jvmtest/openjdk/report"
08:44:08 -jdk:"/Users/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/Contents/Home/bin/../"
08:44:08 -exclude:"/Users/jenkins/workspace/Grinder/openjdk-tests/TKG/../openjdk/openjdk-jdk/test/jdk/ProblemList.txt"
08:44:08 -exclude:"/Users/jenkins/workspace/Grinder/openjdk-tests/TKG/../../jvmtest/openjdk/ProblemList_openjdk11.txt"
08:44:08 "/Users/jenkins/workspace/Grinder/openjdk-tests/TKG/../openjdk/openjdk-jdk/test/jdk:jdk_svc_sanity"; \

EXTRA_OPTIONS is appended to JVM_OPTIONS, which gets fed to jtreg via -vmoptions parameter (see https://openjdk.java.net/jtreg/vmoptions.html)

@M-Davies
Copy link
Author

Exclude PR adoptium/aqa-tests#1734

@karianna
Copy link
Contributor

@M-Davies I've landed that PR - can this be closed in that case?

@gdams
Copy link
Member

gdams commented Apr 17, 2020

@karianna probably not, the issue still exists, we've just skipped the test for now

@M-Davies
Copy link
Author

M-Davies commented Apr 20, 2020

This is also occurring for com/sun/jdi/ProcessAttachTest.sh on test-macincloud-macos1013-x64-* machines (test-macincloud-macos1010-x64-* machines are unaffected) on JDK8_hotspot_mac builds.

@gdams Are there any differences between the 1010 and 1013 machines that could cause this? Maybe that'll lead you to the solution for this?

19:29:52  Exception in thread "main" java.io.IOException: well-known file /var/folders/rc/_hq7pkh54kb0dk8vvlt34pz00000gr/T/.java_pid88080 is not secure: file's group should be the current group (which is -1) but the group is 0
19:29:52  	at com.sun.tools.jdi.ProcessAttachingConnector.attach(ProcessAttachingConnector.java:104)
19:29:52  	at ProcessAttachDebugger.main(ProcessAttachDebugger.java:68)
19:29:52  Waiting for port file to be written...
19:29:52  Waiting 1 second(s) ...
19:29:52  Waiting 2 second(s) ...
19:29:52  Waiting 3 second(s) ...
19:29:52  Waiting 4 second(s) ...
19:29:52  Waiting 5 second(s) ...
...
Timeout!

Exclude PR: adoptium/aqa-tests#1741

@smlambert
Copy link
Contributor

Same core issue for failures in a system test target HCRLateAttachWorkload_previewEnabled (deep history noting it passes on macos1010 but not on macos1012/13/14 machines). This error will occur in any test that invokes jdk.attach/sun.tools.attach.VirtualMachineImpl.checkPermissions(Native Method)

stderr Exception in thread "main" java.io.IOException: well-known file /var/folders/h1/b_v48zpd1zd54jblxnx58p3r0000gr/T/.java_pid1386 is not secure: file's group should be the current group (which is -1) but the group is 0
[2020-10-13T21:04:53.246Z] AG stderr at jdk.attach/sun.tools.attach.VirtualMachineImpl.checkPermissions(Native Method)

@Haroon-Khel Haroon-Khel added this to the May 2021 milestone May 18, 2021
@Haroon-Khel Haroon-Khel modified the milestones: May 2021, June 2021 Jun 21, 2021
@sxa sxa changed the title JTReg Failures: well known file exceptions JTReg Failures: well known file exceptions on macos/hotspot Jul 5, 2021
@sxa
Copy link
Member

sxa commented Jul 5, 2021

Grinder re-run from the ealiier parameters is now running ok as per 1021 1022 1023 - do we need further work on this or can it now be closed (or retested if the test has actually been disabled)? I see it's been re-assiged to @smlambert ...

@smlambert
Copy link
Contributor

Someone should remove the 4 exclusions and rerun tests:
How to find exclusions? search for https://github.com/adoptium/infrastructure/issues/1274 in aqa-tests repo results in 4 findings:

  • ProblemList_openjdk8.txt
  • ProblemList_openjdk11.txt
  • ProblemList_openjdk14.txt
  • ProblemList_openjdk15.txt

@sxa
Copy link
Member

sxa commented Jul 5, 2021

@karianna Was there a reason why this was re-assigned to Shelley in March?

@smlambert
Copy link
Contributor

Created adoptium/aqa-tests#2716 as good first issue to verify this infra issue has been addressed.

@karianna
Copy link
Contributor

karianna commented Jul 5, 2021

@karianna Was there a reason why this was re-assigned to Shelley in March?

I suspect I was thinking it was ready for a re-test but forgot to add a comment explaining that.

@sxa
Copy link
Member

sxa commented Jul 5, 2021

Anyone can handle this so I will unassign.

@sxa sxa modified the milestones: June 2021, July 2021 Jul 5, 2021
@Haroon-Khel Haroon-Khel modified the milestones: July 2021, August Aug 3, 2021
@sxa sxa modified the milestones: August 2021, September 2021 Sep 23, 2021
@sxa
Copy link
Member

sxa commented Sep 23, 2021

@smlambert Have we seen any problems as a result of the exclusions being reverted in the aforementioned PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

7 participants