Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SegFault in PyCall #11395

Closed
yuyichao opened this issue May 21, 2015 · 23 comments
Closed

SegFault in PyCall #11395

yuyichao opened this issue May 21, 2015 · 23 comments

Comments

@yuyichao
Copy link
Contributor

See comments in JuliaPy/PyCall.jl#144

Reproduce with PyCall, run the following script or run in REPL without the printing.

using PyCall
const pltm = pyimport("numpy")
print(pltm)
const plt = pywrap(pltm)
print(plt)

The PR (JuliaPy/PyCall.jl#144) is required to make PyCall working.

GC_VERIFY didn't pick up anything.

I am having a hard time removing the PyCall dependency for reproducing it this time since it is too big (might still try though...). It is not clear whether this is a PyCall.jl bug or a julia bug but since the crash is in not in python I suspect it is julia...

@yuyichao
Copy link
Contributor Author

This is the farthest I could reduce now.

using PyCall

# Othe module does not produce the same error since there's no `True_`
o = pyimport("numpy")
mems = PyCall.pycall(PyCall.inspect["getmembers"], PyCall.PyObject, o)

# Do not call convert -- No SegFault
convert(Vector{Tuple{AbstractString, PyCall.PyObject}}, mems)

# Do not create a new bareodule -- No SegFault
m = Module(:pyimport)
# Do not create nested module -- No SegFault
m = eval(m, Expr(:toplevel,
                 Expr(:module, false, :__anon__, Expr(:block)),
                 :__anon__))

function convert2(o::PyObject)
    try
        info = PyCall.PyArray_Info(o)
        println(1)
        try
            println(2)
            copy(PyCall.PyArray{Bool, length(info.sz)}(o, info))
        catch
            println(3)
        end
    catch
        error()
        PyCall.py2array(Bool, o)
    end
end

# Do not convert any of the following python object -- No SegFault
convert2(o["True_"])

copy(PyDict{PyAny, PyAny}(o["__builtins__"]))

# Change to global value assignment -- No SegFault
eval(m, :(f(s) = 0))

@garrison
Copy link
Member

I am trying to bisect -- I was indeed able to reproduce this using 2e256b9 (the last commit before #11293) and PyCall master.

@yuyichao
Copy link
Contributor Author

@garrison OK good, that is actually what I'm going to do next.
I'm just trying to see if I can get rid of the PyCall dependency.
At least I am not the only one that can reproduce this issue.

@garrison
Copy link
Member

I am suspicious that 18fa555 might have led to the problem, but my bisecting is not yet conclusive.

@garrison
Copy link
Member

My bisect in the end identified 63849fd as the first bad commit. That's not to say that the problem didn't start sooner, as there were commits I almost marked as "good" because using PyPlot worked but your test script above did not. So there might be some intermittent nature to this bug. I'd be curious to know what your bisecting reveals.

@yuyichao
Copy link
Contributor Author

I'm not bisecting now and will probably not be doing that until this weekend.
I would not be surprised if it is caused by #11274 though. It is almost certainly related to modules....

@simonster
Copy link
Member

It probably shouldn't be causing Julia to segfault, but on 0.4 you can now construct an anonymous bare module directly instead of using that ugly hack.

@yuyichao
Copy link
Contributor Author

This is probably GC related. turning off gc before pyimport "fixes" the issue. Smells like a missing GC root...

@yuyichao
Copy link
Contributor Author

This looks like a missing root and it is indeed where the SegFault'd pointer was allocated. However, rooting it here didn't help...

@tkelman
Copy link
Contributor

tkelman commented May 22, 2015

@yuyichao when you link to lines of code, be sure to hit y so github canonicalizes the link with a commit sha, so the same line number remains accurate in the future

@yuyichao
Copy link
Contributor Author

@tkelman Thanks for the tip. I always manually find the file in the commit when I want to do that and that's why I only do it when I think it worth keeping for a long time. Will do that from now on =)

This is sth related to typeinference and GC

using PyCall

# Othe module does not produce the same error since there's no `True_`
gc()

# Do not create a new bareodule -- No SegFault
m = Module(:pyimport)
gc()
# Do not create nested module -- No SegFault
m = eval(m, Expr(:toplevel,
                 Expr(:module, false, :__anon__, Expr(:block)),
                 :__anon__))

gc()
function PyObject2()
    PyObject(p, [true])
end
gc()

gc()
precompile(PyObject2, ())
gc()

eval(m, :(f(s) = 0))
ccall(:jl_gc_disable, Void, ())

When I watch the pointer that got a segfault, it does get assigned multiple times in typeinf although I was too lazy to track the generated code.....

@yuyichao
Copy link
Contributor Author

Note that in the above example, none of the PyCall functions are actually called and the variable p I'm calling PyObject with (ok, not really since I'm not calling PyObject2) is not even defined....

@yuyichao
Copy link
Contributor Author

Repro without PyCall

immutable PyObject_struct
    ob_refcnt::Int
    ob_type::Ptr{Void}
end

typealias PyPtr Ptr{PyObject_struct} # type for PythonObject* in ccall

# Macro version of pyinitialize() to inline initialized? check
macro pyinitialize()
    :(initialized::Bool ? nothing : pyinitialize())
end

#########################################################################
# Wrapper around Python's C PyObject* type, with hooks to Python reference
# counting and conversion routines to/from C and Julia types.

type PyObject
    o::PyPtr # the actual PyObject*
    function PyObject(o::PyPtr)
        po = new(o)
        finalizer(po, pydecref)
        return po
    end
    PyObject() = PyObject(convert(PyPtr, C_NULL))
end

PyObject(o::PyObject) = o

pyenv(f::Function) = Base.withenv(f, "PYTHONIOENCODING"=>"UTF-8")

pysys(python::AbstractString, var::AbstractString) = pyenv() do
    chomp(readall(`$python -c "import sys; print(sys.$var)"`))
end

# global flags to make sure we don't initialize too many times
initialized = false # whether Python is initialized

# initialize the Python interpreter (no-op on subsequent calls)
function pyinitialize(python::AbstractString)
    global initialized
    if !initialized::Bool
        libpy,programname = try
            dlopen_libpython(python), pysys(python, "executable")
        end
    end
    return
end

pyinitialize() = pyinitialize(get(ENV, "PYTHON", "python"))

#########################################################################

# "embed" a reference to jo in po, using the weak-reference mechanism
function pyembed(po::PyObject, jo::Any)
    @pyinitialize
end

# make a PyObject that embeds a reference to keep, to prevent Julia
# from garbage-collecting keep until o is finalized.
PyObject(o::PyPtr, keep::Any) = pyembed(PyObject(o), keep)

# Othe module does not produce the same error since there's no `True_`
gc()

# Do not create a new bareodule -- No SegFault
m = Module(:pyimport)
gc()
# Do not create nested module -- No SegFault
m = eval(m, Expr(:toplevel,
                 Expr(:module, false, :__anon__, Expr(:block)),
                 :__anon__))

gc()
function PyObject2()
    PyObject(p, [true])
end
gc()

precompile(PyObject2, ())
gc()

eval(m, :(f(s) = 0))
ccall(:jl_gc_disable, Void, ())

@yuyichao
Copy link
Contributor Author

m = Module(:_import)
m = eval(m, Expr(:toplevel,
                 Expr(:module, false, :__anon__, Expr(:block)),
                 :__anon__))

# Very likely the overridden `m = Module(:_import)` got leaked here
# Assign the `__anon__` module to another variable surpresses the SegFault
# Although it doesn't trigger a SegFault without calling `precompile` below
gc()
precompile(Base.withenv, (Function, Pair{ASCIIString, ASCIIString}))

eval(m, :(f(s) = 0))

@yuyichao
Copy link
Contributor Author

Is the child module holding a reference to the parent module? (In GC sense)

@carnaval
Copy link
Contributor

Yes but modules are treated specially in gc and the parent module is not marked here. I'll push a fix this is wrong. Thanks for finding this ! (as well as the missing root w/ add_std_imports).

@yuyichao
Copy link
Contributor Author

@carnaval Thanks! Here are some other places that I tried. They all looks suspicious to me but if they were indeed wrong, it would be surprising that they ever worked. Could you have a look/comment too? Thanks

@yuyichao
Copy link
Contributor Author

The module part might be special cases but the deserializeing part looks strange for me...

@yuyichao
Copy link
Contributor Author

And maybe the deserializing part is related to #11397 ?

@carnaval
Copy link
Contributor

(1) and (2) don't need a wb since expr was just created.
(3) and (4) are in the serializer which should disable gc around it anyway.
(5) is definitely needed, I'll put it in the patch.
(6) is a bit special : there is no need for the gc roots because both modules are always reachable through jl_current_module and the binding or later through jl_current_module and jl_current_module->parent. The wb for the binding is not written this way since bindings are not first class. In this case it is always correct to have the wb for the enclosing object but this is old code and since then we have a separate wb for bindings to avoid remarking the whole module.

It does not matter that much in this case but well.

@carnaval
Copy link
Contributor

#11397 is using the julia serializer, not the dump.c one

carnaval added a commit that referenced this issue May 22, 2015
Also a missing root for new modules. Thanks to @yuyiacho for finding
those issues. Fix #11395.
@yuyichao
Copy link
Contributor Author

@carnaval Thnks for the explaination. That makes sense.

@yuyichao
Copy link
Contributor Author

Confirmed that my final repro and using PyPlot in REPL both works.

mbauman pushed a commit to mbauman/julia that referenced this issue Jun 6, 2015
Also a missing root for new modules. Thanks to @yuyichao for finding
those issues. Fix JuliaLang#11395.
tkelman pushed a commit to tkelman/julia that referenced this issue Jun 6, 2015
Also a missing root for new modules. Thanks to @yuyichao for finding
those issues. Fix JuliaLang#11395.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants