Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sporadic segfaults in v0.6.0-rc2 using ODBC package #22256

Closed
galenlynch opened this issue Jun 6, 2017 · 12 comments · Fixed by #22282
Closed

Sporadic segfaults in v0.6.0-rc2 using ODBC package #22256

galenlynch opened this issue Jun 6, 2017 · 12 comments · Fixed by #22282
Milestone

Comments

@galenlynch
Copy link
Contributor

I experience sporadic segfaults when using the ODBC.jl package, but only when I use julia v0.6.0-rc2 (not v0.5.2 or v0.6.0-rc1). At first I thought this was a problem with the ODBC package, but given the functions in the coredump (which seem to be related to memory allocation?) and the fact that I only experience the segfaults when using v0.6.0-rc2, I think this may be a julia bug and not a ODBC bug.

I don't know how to make a 'minimal example' of this bug, since it necessarily involves interacting with a database. However, the following causes a segfault about 80% of the time in julia v0.6.0-rc2:

julia> versioninfo()
Julia Version 0.6.0-rc2.0
Commit 68e911be53* (2017-05-18 02:31 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, skylake)

julia> using ODBC; db = ODBC.DSN("PostgreSQL")
WARNING: Method definition ==(Base.Nullable{S}, Base.Nullable{T}) in module Base at nullable.jl:238 overwritten in module NullableArrays at /home/galen/.julia/v0.6/NullableArrays/src/operators.jl:128.
ODBC.DSN(PostgreSQL)

julia> ODBC.query(db, "SELECT 1;")

signal (11): Segmentation fault
while loading no file, in expression starting on line 0
julia_type_to_llvm at /home/galen/julia-0.6.0-rc2/src/cgutils.cpp:382
typed_store at /home/galen/julia-0.6.0-rc2/src/cgutils.cpp:1221
emit_setfield at /home/galen/julia-0.6.0-rc2/src/cgutils.cpp:2249
emit_builtin_call at /home/galen/julia-0.6.0-rc2/src/codegen.cpp:3073
emit_call at /home/galen/julia-0.6.0-rc2/src/codegen.cpp:3441
emit_expr at /home/galen/julia-0.6.0-rc2/src/codegen.cpp:4139
emit_stmtpos at /home/galen/julia-0.6.0-rc2/src/codegen.cpp:4058 [inlined]
emit_function at /home/galen/julia-0.6.0-rc2/src/codegen.cpp:6242
jl_compile_linfo at /home/galen/julia-0.6.0-rc2/src/codegen.cpp:1256
jl_compile_for_dispatch at /home/galen/julia-0.6.0-rc2/src/gf.c:1672
jl_compile_method_internal at /home/galen/julia-0.6.0-rc2/src/julia_internal.h:305 [inlined]
jl_call_method_internal at /home/galen/julia-0.6.0-rc2/src/julia_internal.h:352 [inlined]
jl_apply_generic at /home/galen/julia-0.6.0-rc2/src/gf.c:1930
stream! at /home/galen/.julia/v0.6/DataStreams/src/DataStreams.jl:239
unknown function (ip: 0x7fbcc1756e6a)
jl_call_fptr_internal at /home/galen/julia-0.6.0-rc2/src/julia_internal.h:337 [inlined]
jl_call_method_internal at /home/galen/julia-0.6.0-rc2/src/julia_internal.h:356 [inlined]
jl_apply_generic at /home/galen/julia-0.6.0-rc2/src/gf.c:1930
#stream!#5 at /home/galen/.julia/v0.6/DataStreams/src/DataStreams.jl:151
jl_call_fptr_internal at /home/galen/julia-0.6.0-rc2/src/julia_internal.h:337 [inlined]
jl_call_method_internal at /home/galen/julia-0.6.0-rc2/src/julia_internal.h:356 [inlined]
jl_apply_generic at /home/galen/julia-0.6.0-rc2/src/gf.c:1930
jl_apply at /home/galen/julia-0.6.0-rc2/src/julia.h:1422 [inlined]
jl_invoke at /home/galen/julia-0.6.0-rc2/src/gf.c:51
#query#14 at /home/galen/.julia/v0.6/ODBC/src/Source.jl:352 [inlined]
query at /home/galen/.julia/v0.6/ODBC/src/Source.jl:352
query at /home/galen/.julia/v0.6/ODBC/src/Source.jl:346
unknown function (ip: 0x7fbcc17337a6)
jl_call_fptr_internal at /home/galen/julia-0.6.0-rc2/src/julia_internal.h:337 [inlined]
jl_call_method_internal at /home/galen/julia-0.6.0-rc2/src/julia_internal.h:356 [inlined]
jl_apply_generic at /home/galen/julia-0.6.0-rc2/src/gf.c:1930
do_call at /home/galen/julia-0.6.0-rc2/src/interpreter.c:75
eval at /home/galen/julia-0.6.0-rc2/src/interpreter.c:242
jl_interpret_toplevel_expr at /home/galen/julia-0.6.0-rc2/src/interpreter.c:34
jl_toplevel_eval_flex at /home/galen/julia-0.6.0-rc2/src/toplevel.c:575
jl_toplevel_eval_in at /home/galen/julia-0.6.0-rc2/src/builtins.c:496
eval at ./boot.jl:235
unknown function (ip: 0x7fbcdd8bfd4f)
jl_call_fptr_internal at /home/galen/julia-0.6.0-rc2/src/julia_internal.h:337 [inlined]
jl_call_method_internal at /home/galen/julia-0.6.0-rc2/src/julia_internal.h:356 [inlined]
jl_apply_generic at /home/galen/julia-0.6.0-rc2/src/gf.c:1930
eval_user_input at ./REPL.jl:66
unknown function (ip: 0x7fbcdd93b28f)
jl_call_fptr_internal at /home/galen/julia-0.6.0-rc2/src/julia_internal.h:337 [inlined]
jl_call_method_internal at /home/galen/julia-0.6.0-rc2/src/julia_internal.h:356 [inlined]
jl_apply_generic at /home/galen/julia-0.6.0-rc2/src/gf.c:1930
macro expansion at ./REPL.jl:97 [inlined]
#1 at ./event.jl:73
unknown function (ip: 0x7fbcc171636f)
jl_call_fptr_internal at /home/galen/julia-0.6.0-rc2/src/julia_internal.h:337 [inlined]
jl_call_method_internal at /home/galen/julia-0.6.0-rc2/src/julia_internal.h:356 [inlined]
jl_apply_generic at /home/galen/julia-0.6.0-rc2/src/gf.c:1930
jl_apply at /home/galen/julia-0.6.0-rc2/src/julia.h:1422 [inlined]
start_task at /home/galen/julia-0.6.0-rc2/src/task.c:267
unknown function (ip: 0xffffffffffffffff)
Allocations: 6448532 (Pool: 6446872; Big: 1660); GC: 7
[1]    25988 segmentation fault (core dumped)  julia0.6

Here is a coredump.txt produced by a different instance of the same bug.

@tkelman
Copy link
Contributor

tkelman commented Jun 7, 2017

can you bisect to identify the specific commit that introduced the issue?

@vtjnash
Copy link
Member

vtjnash commented Jun 7, 2017

Looks like #21620 (fixed by #22019 (comment))

@tkelman
Copy link
Contributor

tkelman commented Jun 7, 2017

which has no tests

@galenlynch
Copy link
Contributor Author

205bf77 is the first bad commit

@tkelman
Copy link
Contributor

tkelman commented Jun 7, 2017

Thanks for the bisect @galenlynch. Ideally we could reduce the test case to something that didn't depend on ODBC, if possible.

@galenlynch
Copy link
Contributor Author

Yeah no problem!

I have no idea what's causing the segfault inside ODBC because I don't understand the coredump or the error message that is produced when it occurs. The fact that it's stochastic, and seems to be influenced by the garbage collector, makes it harder to figure out.

Given that the bad commit seems to be changing how Base does type inference on function returns, maybe looking at the warntype code before and after 205bf77 would help? There are a lot of untyped method signatures, functions with superfluous parametrization etc. in ODBC.jl which may be a challenge for Julia to do type inference on.

@galenlynch
Copy link
Contributor Author

galenlynch commented Jun 7, 2017

I've determined that the segfault does not occur during execution of code in ODBC.jl, but instead during execution of code in DataStreams.jl (v0.1.3).

After inserting a bunch of println statements, it seems like the segfault can happen as early as immediately before this line , and usually before anything in the setrows! function runs. However, sometimes the segfault occurs much later, and sometimes it doesn't it occur at all.

I tried out some simple code to see if using a 'interface' package like DataStreams.jl to set the field of a subtype defined in another package caused the segfault, but that doesn't seem to reproduce the problem.

@galenlynch
Copy link
Contributor Author

Whether or not the bad commit segfaults seems to depend on what environment Julia runs in.

If I try to query a database in Julia by itself, I get a segfault about 80% of the time. If I run Julia in gdb and try to run a query, it does not segfault and exits normally. If I run Julia in valgrind it does not segfault, but Julia stays alive long enough to produce a new error that it never reaches outside of valgrind:

valgrind --log-file=valgrindout.txt --tool=memcheck --smc-check=all-non-file --suppressions=$PWD/../../contrib/valgrind-julia.supp ./julia-debug dberr.jl
WARNING: Method definition ==(Base.Nullable{S}, Base.Nullable{T}) in module Base at nullable.jl:238 overwritten in module NullableArrays at /home/glynch/.julia/v0.6/NullableArrays/src/operators.jl:128.
ERROR: LoadError: type is immutable
Stacktrace:
 [1] setrows!(::ODBC.Source, ::Int64) at /home/glynch/.julia/v0.6/DataStreams/src/DataStreams.jl:56
 [2] stream!(::ODBC.Source, ::Type{DataStreams.Data.Column}, ::DataFrames.DataFrame, ::DataStreams.Data.Schema{true}, ::DataStreams.Data.Schema{true}, ::Array{Function,1}) at /home/glynch/.julia/v0.6/DataStreams/src/DataStreams.jl:239
 [3] #stream!#5(::Array{Any,1}, ::Function, ::ODBC.Source, ::Type{DataFrames.DataFrame}, ::Bool, ::Dict{Int64,Function}) at /home/glynch/.julia/v0.6/DataStreams/src/DataStreams.jl:151
 [4] stream!(::ODBC.Source, ::Type{DataFrames.DataFrame}, ::Bool, ::Dict{Int64,Function}) at /home/glynch/.julia/v0.6/DataStreams/src/DataStreams.jl:145
 [5] include_from_node1(::String) at ./loading.jl:569
 [6] include(::String) at ./sysimg.jl:14
 [7] process_options(::Base.JLOptions) at ./client.jl:305
 [8] _start() at ./client.jl:371
while loading /home/glynch/julia-0.6.0-rc2/usr/bin/dberr.jl, in expression starting on line 4

Where dberr.jl is a simple test script that produces the problem:

using ODBC, DataStreams, DataFrames
db = ODBC.DSN("PostgreSQL")
src = ODBC.Source(db, "SELECT 1;")
Data.stream!(src, DataFrame, false, Dict{Int, Function}()) # this function call segfaults

If I run the same valgrind command in the last good commit of Julia, I do not get this error.

Here is the output of valgrind from commit 205bf77: valgrindout_firstbad.txt
And here is the output of valgrind from the commit just before it, e2e21c5: valgrindout_lastgood.txt

@tkelman tkelman added this to the 0.6.x milestone Jun 8, 2017
@galenlynch
Copy link
Contributor Author

galenlynch commented Jun 8, 2017

Aha! @vtjnash you were right, it is related to #21620. I made a 'minimal' example that shows memory corruption using an interface similar to DataStreams.jl, although I'm sure I could cook up a more minimal example that doesn't use an interface.

First I make a FooInterface module:

__precompile__(true)
module FooInterface

export setbar_inner

abstract type FooType end # not exported, similar to DataStreams.jl

setbar_inner(a) = isdefined(a, :bar) ? (a.bar.inner = 1; nothing) : nothing # similar to DataStreams.jl

end

Then I make a second module, Foo, that uses the interface:

__precompile__(true)
module Foo
using FooInterface

type Bar{AParameter}
    inner::Int
end

type FooChild <: FooInterface.FooType
    bar::Bar
end

function __init__()
    a_foo = FooChild(Bar{true}(2))
    setbar_inner(a_foo)
    println("Attempted to set foo to 1, actually set it to ", a_foo.bar.inner)
end
end

This results in stochastic memory corruption, presumably from the uninitialized memory warnings seen in the valgrind logs.

for i in {1..8}
do
julia0.6 -e 'using Foo'
done
Attempted to set foo to 1, actually set it to 2
Attempted to set foo to 1, actually set it to 2
Attempted to set foo to 1, actually set it to 2
Attempted to set foo to 1, actually set it to 2
Attempted to set foo to 1, actually set it to 40532396646334466
Attempted to set foo to 1, actually set it to 40532396646334466
Attempted to set foo to 1, actually set it to 2
Attempted to set foo to 1, actually set it to 2

@vtjnash
Copy link
Member

vtjnash commented Jun 8, 2017

Thanks, that's was a great reduction!

@galenlynch
Copy link
Contributor Author

Will this fix the valgrind errors? Are they related to this bug, or are they separate?

@vtjnash
Copy link
Member

vtjnash commented Jun 8, 2017

should fix those too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants