Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create_some_context causes Segmentation fault from REPL #176

Closed
lwabeke opened this issue Jun 10, 2019 · 6 comments
Closed

create_some_context causes Segmentation fault from REPL #176

lwabeke opened this issue Jun 10, 2019 · 6 comments

Comments

@lwabeke
Copy link
Contributor

lwabeke commented Jun 10, 2019

Hi

When I call create_some_context from the REPL it causes a Segmentation fault which closes julia, however if I can from the bash command line call example code successfully (./run_examples.sh).

Julia Version 1.1.1
Commit 55e36cc (2019-05-16 04:10 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin15.6.0)
  CPU: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, haswell)

I traced (manually executed code lines) it and it atleast got to the line 140 from src/context.jl:
ctx_id = api.clCreateContext(

I suspect it has something to do with environmental variables/paths and possibly to do with module initialisation code which executes differently from the REPL compared to running a julia script from the bash command line.

Pkg.test("OpenCL") mostly works, but gives the 3 errors :

OpenCL.Program                      |   63     3     66
  OpenCL.Program source constructor |    3            3
  OpenCL.Program info               |   24           24
  OpenCL.Program build              |   12           12
  OpenCL.Program source code        |    3            3
  OpenCL.Program binaries           |   21     3     24```

I'm not sure how to figure out the environmental variables/paths that gets used during the ccall and how to trace it further.
@jpsamaroo
Copy link
Member

Can you post the stacktrace you get from the segfault, as well as the OpenCL driver you're using (I assume it's Apple's OpenCL implementation)? It could be something to do with GC running earlier in the REPL, which is uncovering a bug.

@davidbp
Copy link
Contributor

davidbp commented Jun 29, 2019

Here I can put what I get an OSX with similar specs

julia> versioninfo()
Julia Version 1.1.0
Commit 80516ca202 (2019-01-21 21:24 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, ivybridge)

The test:

julia> Pkg.test("OpenCL") 
   Testing OpenCL
 Resolving package versions...
    Status `/var/folders/mg/sxhvxdv96mqgprwhzxv8gwbw0000gn/T/tmpZQxe4Y/Manifest.toml`
  [08131aa3] OpenCL v0.8.0
  [2a0f44e3] Base64  [`@stdlib/Base64`]
  [8ba89e20] Distributed  [`@stdlib/Distributed`]
  [b77e0a4c] InteractiveUtils  [`@stdlib/InteractiveUtils`]
  [8f399da3] Libdl  [`@stdlib/Libdl`]
  [37e2e46d] LinearAlgebra  [`@stdlib/LinearAlgebra`]
  [56ddb016] Logging  [`@stdlib/Logging`]
  [d6f4376e] Markdown  [`@stdlib/Markdown`]
  [de0858da] Printf  [`@stdlib/Printf`]
  [9a3f8284] Random  [`@stdlib/Random`]
  [9e88b42a] Serialization  [`@stdlib/Serialization`]
  [6462fe0b] Sockets  [`@stdlib/Sockets`]
  [8dfed614] Test  [`@stdlib/Test`]
  [4ec0a83e] Unicode  [`@stdlib/Unicode`]
Test Summary: | Pass  Total
layout        |    2      2
Test Summary:   | Pass  Total
OpenCL.Platform |   13     13
Couldn't compile kernel: 
    1   : 
    2   :     __kernel void test() {
    3   :         int c = 1 + 1;
    4   :     };
With following build error:
No kernels or only kernel prototypes found when build executable.
Couldn't compile kernel: 
    1   : 
    2   :     __kernel void test() {
    3   :         int c = 1 + 1;
    4   :     };
With following build error:
<program source>:5:13: warning: unused variable 'c'
        int c = 1 + 1;
            ^
No kernels or only kernel prototypes found.

Couldn't compile kernel: 
    1   : 
    2   :     __kernel void test() {
    3   :         int c = 1 + 1;
    4   :     };
With following build error:
<program source>:5:13: warning: unused variable 'c'
        int c = 1 + 1;
            ^
No kernels or only kernel prototypes found.

Test Summary:  | Callback works
Pass  TotalCallback works

OpenCL.ContextCallback works
 |   50  Callback works
   50
Callback works
Test Summary: | Pass  Callback works
Total
Callback works
OpenCL.Device | Callback works
 122    122Callback works

┌ Warning: Platform Apple does not seem to suport out of order queues:CLError(code=-30, CL_INVALID_VALUE)
└ @ Main.TestOpenCL ~/.julia/packages/OpenCL/vsBez/test/test_cmdqueue.jl:16
OpenCL Error: | [CL_INVALID_VALUE] : OpenCL Error : clCreateCommandQueue failed: Device failed to create queue (cld returned: -35). |
Test Summary:   | OpenCL Error: | [CL_INVALID_VALUE] : OpenCL Error : clCreateCommandQueue failed: Device failed to create queue (cld returned: -35). |
Pass  TotalOpenCL Error: | [CL_INVALID_VALUE] : OpenCL Error : clCreateCommandQueue failed: Device failed to create queue: -30 |

OpenCL.CmdQueueOpenCL Error: | [CL_INVALID_VALUE] : OpenCL Error : clCreateCommandQueue failed: Device failed to create queue (cld returned: -35). |
 |   61  OpenCL Error: | [CL_INVALID_VALUE] : OpenCL Error : clCreateCommandQueue failed: Device failed to create queue: -30 |
   61
OpenCL Error: | [CL_INVALID_VALUE] : OpenCL Error : clCreateCommandQueue failed: Device failed to create queue: -30 |
Test Summary: | Pass  Total
OpenCL.Minver |   20     20
Test Summary: | Pass  Total
OpenCL.Event  |   48     48
OpenCL.Program binaries: Test Failed at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:86
  Expression: prg2[:binaries] == binaries
   Evaluated: Dict{OpenCL.cl.Device,Array{UInt8,N} where N}() == Dict{OpenCL.cl.Device,Array{UInt8,N} where N}(OpenCL.Device(Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz on Apple @0x00000000ffffffff)=>[0x62, 0x70, 0x6c, 0x69, 0x73, 0x74, 0x30, 0x30, 0xd4, 0x01    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0xee])
Stacktrace:
 [1] top-level scope at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:86
 [2] top-level scope at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
 [3] top-level scope at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:76
 [4] top-level scope at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
 [5] top-level scope at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:3
OpenCL.Program binaries: Test Failed at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:86
  Expression: prg2[:binaries] == binaries
   Evaluated: Dict{OpenCL.cl.Device,Array{UInt8,N} where N}() == Dict{OpenCL.cl.Device,Array{UInt8,N} where N}(OpenCL.Device(AMD Radeon HD - FirePro D300 Compute Engine on Apple @0x0000000001021c00)=>[0x62, 0x70, 0x6c, 0x69, 0x73, 0x74, 0x30, 0x30, 0xd4, 0x01    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x35, 0xd2])
Stacktrace:
 [1] top-level scope at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:86
 [2] top-level scope at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
 [3] top-level scope at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:76
 [4] top-level scope at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
 [5] top-level scope at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:3
OpenCL.Program binaries: Test Failed at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:86
  Expression: prg2[:binaries] == binaries
   Evaluated: Dict{OpenCL.cl.Device,Array{UInt8,N} where N}() == Dict{OpenCL.cl.Device,Array{UInt8,N} where N}(OpenCL.Device(AMD Radeon HD - FirePro D300 Compute Engine on Apple @0x0000000002021c00)=>[0x62, 0x70, 0x6c, 0x69, 0x73, 0x74, 0x30, 0x30, 0xd4, 0x01    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x35, 0xd2])
Stacktrace:
 [1] top-level scope at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:86
 [2] top-level scope at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
 [3] top-level scope at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:76
 [4] top-level scope at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
 [5] top-level scope at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:3
Test Summary:                       | Pass  Fail  Total
OpenCL.Program                      |   63     3     66
  OpenCL.Program source constructor |    3            3
  OpenCL.Program info               |   24           24
  OpenCL.Program build              |   12           12
  OpenCL.Program source code        |    3            3
  OpenCL.Program binaries           |   21     3     24
ERROR: LoadError: LoadError: Some tests did not pass: 63 passed, 3 failed, 0 errored, 0 broken.
in expression starting at /Users/macpro/.julia/packages/OpenCL/vsBez/test/test_program.jl:1
in expression starting at /Users/macpro/.julia/packages/OpenCL/vsBez/test/runtests.jl:30
ERROR: Package OpenCL errored during testing
Stacktrace:
 [1] pkgerror(::String, ::Vararg{String,N} where N) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/Types.jl:120
 [2] #test#66(::Bool, ::Function, ::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/Operations.jl:1328
 [3] #test at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:0 [inlined]
 [4] #test#44(::Bool, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:193
 [5] test at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:178 [inlined]
 [6] #test#43 at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:175 [inlined]
 [7] test at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:175 [inlined]
 [8] #test#42 at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:174 [inlined]
 [9] test at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:174 [inlined]
 [10] #test#41(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::String) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:173
 [11] test(::String) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/API.jl:173
 [12] top-level scope at none:0

@lwabeke
Copy link
Contributor Author

lwabeke commented Jul 3, 2019

Hi

I just did a ]update and still getting the same

If I open Julia and call ]test OpenCL I get essentially the same output as @davidbp , where some tests run through and others fail, but those failures don't cause the Julia process to just die.

Running create_some_context as first command after the using OpenCL in new Julia session causes the crash, see below:

Leons-MacBook-Pro:~ lwabeke$ /Applications/Julia-1.1.app/Contents/Resources/julia/bin/julia 
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _ |  |
  | | |_| | | | (_| |  |  Version 1.1.1 (2019-05-16)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using OpenCL

julia> device, ctx, queue = cl.create_compute_context()
Segmentation fault: 11
Leons-MacBook-Pro:~ lwabeke$ 


Trying to get a stack backtrace, this is the best I can do at the moment. I guess if I need more details, I would have to custom built Julia with debugging enabled?

Leons-MacBook-Pro:~ lwabeke$ lldb /Applications/Julia-1.1.app/Contents/Resources/julia/bin/julia 
(lldb) target create "/Applications/Julia-1.1.app/Contents/Resources/julia/bin/julia"
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 52, in <module>
    import weakref
  File "/usr/local/Cellar/python@2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/weakref.py", line 14, in <module>
    from _weakref import (
ImportError: cannot import name _remove_dead_weakref
Current executable set to '/Applications/Julia-1.1.app/Contents/Resources/julia/bin/julia' (x86_64).
(lldb) run
Process 19629 launched: '/Applications/Julia-1.1.app/Contents/Resources/julia/bin/julia' (x86_64)
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _|  |
  | | |_| | | | (_| |  |  Version 1.1.1 (2019-05-16)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using OpenCL

julia> device, ctx, queue = cl.create_compute_context()
Process 19629 stopped
* thread #7, queue = 'com.apple.root.utility-qos', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00007fff62131232 libsystem_c.dylib`strlen + 18
libsystem_c.dylib`strlen:
->  0x7fff62131232 <+18>: pcmpeqb (%rdi), %xmm0
    0x7fff62131236 <+22>: pmovmskb %xmm0, %esi
    0x7fff6213123a <+26>: andq   $0xf, %rcx
    0x7fff6213123e <+30>: orq    $-0x1, %rax
Target 0: (julia) stopped.
(lldb) bt
* thread #7, queue = 'com.apple.root.utility-qos', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00007fff62131232 libsystem_c.dylib`strlen + 18
    frame #1: 0x00007fff4292c882 OpenCL`___lldb_unnamed_symbol2$$OpenCL + 916
    frame #2: 0x00007fff620af5fa libdispatch.dylib`_dispatch_call_block_and_release + 12
    frame #3: 0x00007fff620a7db8 libdispatch.dylib`_dispatch_client_callout + 8
    frame #4: 0x00007fff620a9b2c libdispatch.dylib`_dispatch_root_queue_drain + 902
    frame #5: 0x00007fff620a9755 libdispatch.dylib`_dispatch_worker_thread3 + 101
    frame #6: 0x00007fff623f9169 libsystem_pthread.dylib`_pthread_wqthread + 1387
    frame #7: 0x00007fff623f8be9 libsystem_pthread.dylib`start_wqthread + 13

@jpsamaroo
Copy link
Member

Given that we don't get a pretty backtrace after the segfault, could you put the call to cl.create_some_context or cl.create_compute_context in Debugger.jl and see which line it crashes on? It should probably be a ccall, since it seems to be crashing in C. I suspect we're passing arguments incorrectly or some such thing.

@kose-y
Copy link

kose-y commented Jul 6, 2019

I have the same issue on OSX with OpenCL.jl v0.8.0.

julia> versioninfo()
Julia Version 1.1.1
Commit 55e36cc (2019-05-16 04:10 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin15.6.0)
  CPU: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

the initial call here was device = last(cl.devices(:gpu)); @enter cl.Context(device).
Here is the status and backtrace right before the segmentation fault:

In clCreateContext(arg1, arg2, arg3, arg4, arg5, arg6) at /Users/kose/.julia/packages/OpenCL/vsBez/src/api.jl:18
 17          function $func($(args_in...))
>18              ccall(($(string(func)), libopencl),
 19                     $ret_type,
 20                     $arg_types,
 21                     $(args_in...))
 22          end

About to run: (<suppressed 140 bytes of output>)(Ptr{Nothing} @0x0000000000000000, 1, <suppressed 46 bytes of output>, Ptr{Nothing} @0x000000012abbef10, <suppressed 147 bytes of output>, Base.RefValue{Int32}(0))
1|debug> bt
[1] clCreateContext(arg1, arg2, arg3, arg4, arg5, arg6) at /Users/kose/.julia/packages/OpenCL/vsBez/src/api.jl:18
  | arg1::Ptr{Nothing} = Ptr{Nothing} @0x0000000000000000
  | arg2::Int64 = 1
  | arg3::Array{Ptr{Nothing},1} = Ptr{Nothing}[Ptr{Nothing} @0x0000000001021c00]
  | arg4::Ptr{Nothing} = Ptr{Nothing} @0x000000012abbef10
  | arg5::Base.CFunction = Base.CFunction(Ptr{Nothing} @0x000000012abbec40, OpenCL.cl.raise_context_error, Ptr{Nothing} @0x0000000000000000, Ptr{Nothing} @0x0000000000000000)
  | arg6::Base.RefValue{Int32} = Base.RefValue{Int32}(0)
[2] #Context#44(properties, callback, , devs) at /Users/kose/.julia/packages/OpenCL/vsBez/src/context.jl:140
  | properties::Nothing = nothing
  | callback::Nothing = nothing
  | ::DataType = OpenCL.cl.Context
  | devs::Array{OpenCL.cl.Device,1} = OpenCL.cl.Device[OpenCL.Device(AMD Radeon Pro 560 Compute Engine on Apple @0x0000000001021c00)]
  | ctx_properties::Ptr{Nothing} = Ptr{Nothing} @0x0000000000000000
  | n_devices::Int64 = 1
  | device_ids::Array{Ptr{Nothing},1} = Ptr{Nothing}[Ptr{Nothing} @0x0000000001021c00]
  | err_code::Base.RefValue{Int32} = Base.RefValue{Int32}(0)
  | payload::typeof(OpenCL.cl.raise_context_error) = OpenCL.cl.raise_context_error
  | f_ptr::Base.CFunction = Base.CFunction(Ptr{Nothing} @0x000000012abbec40, OpenCL.cl.raise_context_error, Ptr{Nothing} @0x0000000000000000, Ptr{Nothing} @0x0000000000000000)
  | i::Int64 = 1
  | d::OpenCL.cl.Device = OpenCL.Device(AMD Radeon Pro 560 Compute Engine on Apple @0x0000000001021c00)
[3] Type(#temp#, , devs) at none:0
  | ::DataType = OpenCL.cl.Context
  | devs::Array{OpenCL.cl.Device,1} = OpenCL.cl.Device[OpenCL.Device(AMD Radeon Pro 560 Compute Engine on Apple @0x0000000001021c00)]
  | properties::Nothing = nothing
  | callback::Nothing = nothing
[4] #Context#45(properties, callback, , d) at /Users/kose/.julia/packages/OpenCL/vsBez/src/context.jl:150
  | properties::Nothing = nothing
  | callback::Nothing = nothing
  | ::DataType = OpenCL.cl.Context
  | d::OpenCL.cl.Device = OpenCL.Device(AMD Radeon Pro 560 Compute Engine on Apple @0x0000000001021c00)
[5] Type(d) at /Users/kose/.julia/packages/OpenCL/vsBez/src/context.jl:150
  | d::OpenCL.cl.Device = OpenCL.Device(AMD Radeon Pro 560 Compute Engine on Apple @0x0000000001021c00)
1|debug> s
Segmentation fault: 11

@juliohm
Copy link
Member

juliohm commented Oct 1, 2022

I built OpenCL_jll with BinaryBuilder.jl and plan to refactor the package here to load it instead. This will install the binary dependencies for the end user in theory. I will close the issue as it is hard to reproduce in 2022.

@juliohm juliohm closed this as completed Oct 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants