Skip to content
This repository has been archived by the owner on Aug 5, 2021. It is now read-only.

calling get_type from java after making a few PyObject instances causes a SIGSEGV #112

Open
rajha-korithrien opened this issue Oct 27, 2017 · 5 comments

Comments

@rajha-korithrien
Copy link

rajha-korithrien commented Oct 27, 2017

Platform: CentOS Linux release 7.3.1611
Java: OpenJDK Runtime Environment (build 1.8.0_131-b12)
Python: Python 2.7.5 (default, Nov 6 2016, 00:28:07) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
JPY: from master 0.9-SNAPSHOT commit 4f9aacc

This may be related to issues #74, #85 (I am only guessing here)

The following code can be used to reproduce the error

import org.jpy.PyLib;
import org.jpy.PyModule;
import org.jpy.PyObject;

/**
 * This class is used to show that jpy 0.9-SNAPSHOT master branch from commit 4f9aacc066695426a048d011660394871c542aeb can be easily made to segfault.
 * This seems to have something to do with how jpy is doing reference counting... but the underlying cause is still a mystery.
 * @author rajha.korithrien
 */
public class SegfaultExample {

    public static void main(String[] args){
        PyLib.startPython();

        PyModule builtIn = PyModule.getBuiltins();
        PyModule jpy = PyModule.importModule("jpy");
        PyModule np = PyModule.importModule("numpy");

        PyObject array = np.call("ones", 100);
        PyObject maxValue = array.callMethod("max");

        for(int i = 0; i < 5; i++){
            PyObject bool = builtIn.call("isinstance", maxValue, np.getAttribute("float32"));
            builtIn.call("print", bool);
        }

        PyObject jStringClass = jpy.call("get_type", "java.lang.String");
    }
}

If the loop is changed to say only 2 iterations, the error does not occur.

The result from the above code is:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f1507abcd17, pid=22765, tid=0x00007f157d182700
#
# JRE version: OpenJDK Runtime Environment (8.0_131-b12) (build 1.8.0_131-b12)
# Java VM: OpenJDK 64-Bit Server VM (25.131-b12 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libpython2.7.so.1.0+0x89d17]  PyObject_Malloc+0xa7
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/rajha/Projects/DAVE-Legacy/machine-learning-research/hs_err_pid22765.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
False
False
False
False
False

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

Loading it up in gdb I can get the following backtrace.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f0f76e86700 (LWP 24026)]
0x00007f0f54f95d17 in PyObject_Malloc () from /lib64/libpython2.7.so.1.0
(gdb) bt
#0  0x00007f0f54f95d17 in PyObject_Malloc () from /lib64/libpython2.7.so.1.0
#1  0x00007f0f54f92aed in _PyObject_New () from /lib64/libpython2.7.so.1.0
#2  0x00007f0f54afb7ab in JObj_FromType (jenv=jenv@entry=0x7f0f700159e0, type=0x7f0f70223dd0, objectRef=0x7f0f70149698) at src/main/c/jpy_jobj.c:45
#3  0x00007f0f54af7a90 in JType_AddClassAttribute (jenv=jenv@entry=0x7f0f700159e0, declaringClass=declaringClass@entry=0x7f0f7065e420)
    at src/main/c/jpy_jtype.c:1135
#4  0x00007f0f54af8eef in JType_GetType (jenv=jenv@entry=0x7f0f700159e0, classRef=classRef@entry=0x7f0f7009b4c0, resolve=resolve@entry=0 '\000')
    at src/main/c/jpy_jtype.c:183
#5  0x00007f0f54af9677 in JType_InitSuperType (jenv=jenv@entry=0x7f0f700159e0, type=type@entry=0x7f0f70230760, resolve=resolve@entry=0 '\000')
    at src/main/c/jpy_jtype.c:946
#6  0x00007f0f54af8ebf in JType_GetType (jenv=jenv@entry=0x7f0f700159e0, classRef=0x7f0f7009b4d8, resolve=resolve@entry=0 '\000') at src/main/c/jpy_jtype.c:161
#7  0x00007f0f54af9bc8 in JType_CreateParamDescriptors (jenv=jenv@entry=0x7f0f700159e0, paramCount=paramCount@entry=1, 
    paramClasses=paramClasses@entry=0x7f0f7009b4e0) at src/main/c/jpy_jtype.c:1343
#8  0x00007f0f54af86a0 in JType_ProcessMethod (jenv=jenv@entry=0x7f0f700159e0, type=type@entry=0x7f0f7023e900, methodKey=methodKey@entry=0x7f0f3c0552d0, 
    methodName=methodName@entry=0x7f0f54b01708 "__jinit__", returnType=returnType@entry=0x0, paramTypes=paramTypes@entry=0x7f0f7009b4e0, 
    isStatic=isStatic@entry=1 '\001', mid=0x7f0f706a6210) at src/main/c/jpy_jtype.c:884
#9  0x00007f0f54af8990 in JType_ProcessClassConstructors (jenv=jenv@entry=0x7f0f700159e0, type=type@entry=0x7f0f7023e900) at src/main/c/jpy_jtype.c:991
#10 0x00007f0f54af8c4f in JType_ResolveType (jenv=0x7f0f700159e0, type=0x7f0f7023e900) at src/main/c/jpy_jtype.c:826
#11 0x00007f0f54af8de3 in JType_GetType (jenv=jenv@entry=0x7f0f700159e0, classRef=0x7f0f7009b4b8, resolve=<optimized out>) at src/main/c/jpy_jtype.c:212
#12 0x00007f0f54af9494 in JType_GetTypeForName (jenv=jenv@entry=0x7f0f700159e0, typeName=<optimized out>, resolve=<optimized out>) at src/main/c/jpy_jtype.c:118
#13 0x00007f0f54af24e4 in JPy_get_type (self=<optimized out>, args=0x7f0f54a17690, kwds=0x0) at src/main/c/jpy_module.c:477
#14 0x00007f0f54f578e3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#15 0x00007f0f54fe96f7 in PyEval_CallObjectWithKeywords () from /lib64/libpython2.7.so.1.0
#16 0x00007f0f54aff337 in PyLib_CallAndReturnObject (jenv=jenv@entry=0x7f0f700159e0, pyObject=pyObject@entry=0x7f0f54a14be8, 
    isMethodCall=isMethodCall@entry=0 '\000', jName=jName@entry=0x7f0f76e858b8, argCount=argCount@entry=1, jArgs=jArgs@entry=0x7f0f76e858a8, 
    jParamClasses=jParamClasses@entry=0x0) at src/main/c/jni/org_jpy_PyLib.c:962
#17 0x00007f0f54aff4e5 in Java_org_jpy_PyLib_callAndReturnObject (jenv=0x7f0f700159e0, jLibClass=<optimized out>, objId=139703821093864, 
    isMethodCall=<optimized out>, jName=0x7f0f76e858b8, argCount=1, jArgs=0x7f0f76e858a8, jParamClasses=0x0) at src/main/c/jni/org_jpy_PyLib.c:789
#18 0x00007f0f5fff19e4 in ?? ()
#19 0x00007f0f76e858a8 in ?? ()
#20 0x0000000000000000 in ?? ()
(gdb

I am clueless when it comes to the internals of CPython, but I am a pretty good Java/C guy if someone can point me in the right direction I may have some cycles to put into this.
Thanks!

P.S The problem is reproducible on:

macOS 10.12.6
JRE version: Java(TM) SE Runtime Environment (8.0_152-b16) (build 1.8.0_152-b16)
Python 2.7.14 [GCC 4.2.1 Compatible Apple LLVM 9.0.0]

But you have to increase the loop iteration count to 10

@forman
Copy link
Member

forman commented Oct 30, 2017

@rajha-korithrien thanks for the detailed report. Indeed it seems #74 is back.

@forman forman self-assigned this Oct 30, 2017
@rajha-korithrien
Copy link
Author

I am happy to try to figure out how to fix this. Do you have any hints/thoughts about where to start?
Thanks!

@forman
Copy link
Member

forman commented Nov 1, 2017

Thanks for your offer! This is very welcome as my focus is currently on other projects.

Just guessing, but usually this kind of problem originates from incorrect use of the memory allocation/releasing API of JNI API or the reference counting API of CPython. Of course, the contract is release Java objects that have been allocated and decrease reference count to Python objects that have been referenced as long as their references are not return values.

I'd start by modifying your example code to see if there are variations / configurations that impact the problem, e.g. move the np.getAttribute() outside the loop. The hope would be to find a candidate jpy Java API call for further debugging and a detailed analysis of the C/JNI code downstream that call.

Not you can get extra debugging output if you set some disgnosis flags, e.g.

PyLib.Diag.setFlags(PyLib.Diag.F_MEM | PyLib.Diag.F_EXEC);

In your example, you are

  1. calling a Java method call() of the org.jpy.PyModule class which is implemented in C/JNI which
  2. invokes the Python function get_type() of the jpy package which is again implemented in C/JNI

We are using jpy in many projects making lots of jpy.get_type() calls, but usually directly from Python.
Maybe reentering the jpy.so C/JNI code produces the problem. Again, just guessing.

Finally, it could make sense to have a debug switch that allows wrapping all CPython INREF/DECREF calls (using C-macros) so that we can observe them and output some statistics at desired breakpoints.

@sbarnoud
Copy link
Contributor

The release 0.9 should correct that (corrected invalid reference count when passing back and forth parameter from Java to Python).
Could you try with it ?

@forman
Copy link
Member

forman commented Feb 12, 2018

No, not yet. We required a non-snapshot release with the current set of fixes for Python 3.5 and 3.6. Any upcoming fixes will go into a 0.9.x.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants