Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run jextract on core files on linux, aix, osx #8772

Open
pshipton opened this issue Mar 6, 2020 · 21 comments
Open

Run jextract on core files on linux, aix, osx #8772

pshipton opened this issue Mar 6, 2020 · 21 comments

Comments

@pshipton
Copy link
Member

pshipton commented Mar 6, 2020

When a core file occurs on linux, aix, osx systems, the shared libraries are not included in the core. The system libraries are required in order to look at stack traces in the core file. jextract gathers all the shared libraries used by a core and compresses them along with the core file for further analysis. When a core file is created in a test run, running jextract on one of them will provide the information required without needing to get access to the machine where the problem occurred.

@pshipton
Copy link
Member Author

pshipton commented Mar 6, 2020

@AdamBrousseau
Copy link
Contributor

Will this be required on the compile job as well?

@smlambert
Copy link
Contributor

Purposefully naive question, can the VM do that as it crashes?

@pshipton
Copy link
Member Author

pshipton commented Mar 6, 2020

Will this be required on the compile job as well?

Yes, any core file.

Purposefully naive question, can the VM do that as it crashes?

Kind of. For example, the following will do it.
bin/java -Xdump:system:events=vmstop -Xdump:tool:events=vmstop,exec="bin/jextract %last"

Some notes with this approach.

  • we don't need to run jextract for every core, one per test run should be enough. It's not harmful to run on every core, but takes extra time and space.
  • the test needs to run with the correct option for jextract to run. This can be set in the OPENJ9_JAVA_OPTIONS environment variable. -Xdump options combine, so unless the test is setting another -Xdump:tool option the one to invoke jextract will take effect.
  • the option needs to know where to find jextract, although any version is usually fine, it doesn't need to be jextract from the JVM under test.
  • the events part needs to specify the reason why the core was created. Although my example uses vmstop, a test should specify gpf+abort+traceassert+corruptcache.
  • I assume this will work for something unexpected like a gpf, but didn't try it.

@pshipton
Copy link
Member Author

pshipton commented Mar 6, 2020

@DanHeidinga fyi

@DanHeidinga
Copy link
Member

@pshipton It might be worth having a conversation with Manqing about jextract. If it's this easy, I'm somewhat surprised we haven't done this already with the IBM SDK to help with the "must gather" service list...

@pshipton
Copy link
Member Author

pshipton commented Mar 6, 2020

@manqingl any thoughts about the above, using -Xdump:tool to automatically run jextract on a core file?

@manqingl
Copy link

manqingl commented Mar 6, 2020

This is definitely a good idea. We have been depending on L2 to work with customers to get the system core files jextracted. This is not working well so far. In the last week, I have to go back to two different customers to do jextract on the system core files again. Without the native libraries, I could not even get a decent native stack trace from native debuggers (because of the OS mismatch between the debug machines in EcuRep and the runtime machines customers were using).

Jextract is not necessary for zOS.

I am not sure if the -Xdump:tool option would work, I assume we have to ensure the system core files were generated and renamed first. If it works, it will be a very useful feature.

I personally prefer to have the (at the first) system core file jextracted by default. This could be a little too aggressive. Let me raise it in our service community.

@JBKingdon
Copy link

+1 for having something like jextract run automatically. However jextract does a lot more than just collect the libraries and has been known to either fail or take so long to run as to be unusable. Is there currently any consumer of the non-library information, and if not, is this a good opportunity to create a new tool that just gathers libraries (and perhaps a few other files such as javacore/snap/jitdump?) and is hopefully more robust and quicker to run?

@klangman
Copy link
Contributor

klangman commented Mar 6, 2020

Automating jextract would save a step that is missed quite frequently, but this would also add to the time it takes to for the JVM to end after a crash and consume more disk space. The most important thing is that the core file is not effected. Any reduction in the reliability of obtaining an intact system core file is IMHO unacceptable.

@pshipton
Copy link
Member Author

pshipton commented Mar 6, 2020

However jextract does a lot more than just collect the libraries

@JBKingdon I think for Java 8+ current versions it only collects the libraries, have you seen differently?

The most important thing is that the core file is not affected.

jextract zips up the core file, but doesn't delete or otherwise touch the original file.

@klangman
Copy link
Contributor

klangman commented Mar 6, 2020

jextract zips up the core file, but doesn't delete or otherwise touch the original file.

It's the unknown that I worry about. Some strange interaction that was not expected. Maybe I am a bit paranoid but without system core files our ability to service the JVM is almost gone. We should we have a way to disable this feature in case it causes some issues with core file generation.

@JBKingdon
Copy link

@JBKingdon I think for Java 8+ current versions it only collects the libraries, have you seen differently?

Oh great! I only look for the libraries so I hadn't realised the other stuff had gone :) I did hear that someone tried running jextract on a 300GB core and it didn't finish in a reasonable period of time, but the case hasn't reached me yet so I don't know what the version is.

@manqingl
Copy link

manqingl commented Mar 6, 2020

In the past, the most likely causes of jextract failure are:

(1) System core file is truncated.
(2) Customized executable is not specified.
(3) Disk space availability
(4) Wrong machine/wrong product installation locations (?)

I tend to think the situation can be better if we do it at JVM runtime.

@pshipton
Copy link
Member Author

pshipton commented Mar 6, 2020

I did hear that someone tried running jextract on a 300GB core and it didn't finish in a reasonable period of time

It does

  • process the core file looking for shared libraries. This probably doesn't take any longer on a bigger core file, but not 100% sure.
  • zip up the core file, I can imagine that zipping a 300GB core file would take a considerable amount of time.

I tend to think the situation can be better if we do it at JVM runtime.

We'd try to automatically run jextract (if it can be found) rather than expecting the user to do it. This won't help with truncated core file or disk space availability. Maybe it could help with a customized executable, and should solve wrong machine/wrong product issues.

@klangman
Copy link
Contributor

klangman commented Mar 6, 2020

Maybe instead of compressing the core file, we just create a zip file with just the libraries (and maybe snap, javacore, jitdump, heapdump files) so that it simplifies the must-gather process.

@smithwil
Copy link

smithwil commented Mar 9, 2020

If this worked reliably it would definitely help diagnose problems.

There was an earlier attempt at a jvmti library to set Xdump options to run jextract as described and also to collect the other types of dump into a single zip file. It was not well adopted because it was not on by default and when it was switched on, people did not like seeing unexpected processing start up after a java crash or other error. I think these issues could be overcome with careful user interface design, good documentation and communication with people who use J9.

@paulcheeseman
Copy link

If we do use -Xdump:tool to achieve this, we should probably specify opts=ASYNC as well.

https://www.eclipse.org/openj9/docs/xdump/#tool-dumps

@Tom-Poe
Copy link

Tom-Poe commented Mar 9, 2020

jextract needs to know if the process in question was running with or without compressedrefs.
(-J-Xcompressedrefs)
Can compressed ref usage be determined at the point when jextract would be run?

@pshipton
Copy link
Member Author

pshipton commented Mar 9, 2020

jextract needs to know if the process in question was running with or without compressedrefs.

I don't believe this requirement is obsolete, I'll double check and update the User Guide.

@pshipton
Copy link
Member Author

pshipton commented Mar 9, 2020

Created eclipse-openj9/openj9-docs#532 to update the User Guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants