Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in TBB, maybe parfors related. #5973

Closed
stuartarchibald opened this issue Jul 9, 2020 · 9 comments
Closed

Segfault in TBB, maybe parfors related. #5973

stuartarchibald opened this issue Jul 9, 2020 · 9 comments
Labels
bug - segfault Bugs that cause SIGSEGV, SIGABRT, SIGILL, SIGBUS bug threading Issue involving the threading layers

Comments

@stuartarchibald
Copy link
Contributor

Reporting a bug

There's a problem in this branch https://github.com/stuartarchibald/numba/tree/pr_5463_segfault, head is c84bfbd, it's a clone of PR #5463 which is segfaulting on CI. It's reproducible locally with:

./runtests.py -m 1 -v -b -j "2,None,18" --exclude-tags='long_running' numba.tests

and in fact any -m <X> will segfault. However, this:

./runtests.py -v -b -j "2,None,18" --exclude-tags='long_running' numba.tests

will not. The difference here being that the use of the multiprocessing test runner (or not, as in the latter case).

Backtrace is:

test_issue4963_globals (numba.tests.test_parfors.TestParfors) ... 
Program received signal SIGSEGV, Segmentation fault.
tbb::internal::generic_scheduler::allocate_task (this=this@entry=0x7fffd60b2400, number_of_bytes=number_of_bytes@entry=104, 
    parent=parent@entry=0x0, context=0x7fffffff8b80) at ../../src/tbb/scheduler.cpp:357
357     ../../src/tbb/scheduler.cpp: No such file or directory.
(gdb) bt
#0  tbb::internal::generic_scheduler::allocate_task (this=this@entry=0x7fffd60b2400, 
    number_of_bytes=number_of_bytes@entry=104, parent=parent@entry=0x0, context=0x7fffffff8b80)
    at ../../src/tbb/scheduler.cpp:357
#1  0x00007fffdbe6363b in tbb::internal::allocate_root_with_context_proxy::allocate (this=0x7fffffff8b78, size=104)
    at ../../src/tbb/task.cpp:64
#2  0x00007fffdbe3cd08 in tbb::interface7::internal::delegated_function<parallel_for(void*, char**, unsigned long*, unsigned long*, void*, unsigned long, unsigned long, int)::{lambda()#1} const, void>::operator()() const ()
   from <numba_path>/numba/numba/np/ufunc/tbbpool.cpython-37m-x86_64-linux-gnu.so
#3  0x00007fffdbe69250 in tbb::interface7::internal::task_arena_base::internal_execute (this=0x7fffffff8f50, d=...)
    at ../../src/tbb/arena.cpp:1054
#4  0x00007fffdbe3d60e in parallel_for(void*, char**, unsigned long*, unsigned long*, void*, unsigned long, unsigned long, int)
    () from <numba_path>/numba/numba/np/ufunc/tbbpool.cpython-37m-x86_64-linux-gnu.so
#5  0x00007ffff7e0c17e in numba::tests::test_parfors::TestParfors::test_issue4963_globals::$3clocals$3e::test_impl$24118 ()
#6  0x00007ffff7e0c2fc in cpython::numba::tests::test_parfors::TestParfors::test_issue4963_globals::$3clocals$3e::test_impl$24118 ()
#7  0x000055555568be05 in cfunction_call_varargs (kwargs=0x0, args=args@entry=0x0, func=0x7fffd1d9ac80)
    at /tmp/build/80754af9/python_1585235023510/work/Objects/call.c:757
@stuartarchibald stuartarchibald added bug - segfault Bugs that cause SIGSEGV, SIGABRT, SIGILL, SIGBUS threading Issue involving the threading layers labels Aug 12, 2020
@jakirkham
Copy link
Contributor

FWIW TBB caused us some issues with aarch64 builds in conda-forge recently as well.

@stuartarchibald
Copy link
Contributor Author

Potential leak: opencv/opencv#6843

@stuartarchibald
Copy link
Contributor Author

Related? https://community.intel.com/t5/Intel-oneAPI-Threading-Building/Segmentation-Fault-caused-by-parallel-for/m-p/1085795#M13548 I thought however we'd been down the route of getting TBB to build with -flifetime-dse=1 etc.

@stuartarchibald
Copy link
Contributor Author

Don't think it's the above, think this may be corruption.

The program segfaults in this source as follows:

Program received signal SIGSEGV, Segmentation fault.
tbb::internal::generic_scheduler::allocate_task (this=this@entry=0x7fffd548ce00, number_of_bytes=number_of_bytes@entry=104, 
    parent=parent@entry=0x0, context=0x7fffffff8cb0) at ../../src/tbb/scheduler.cpp:357
357     ../../src/tbb/scheduler.cpp: No such file or directory.
(gdb) bt
#0  tbb::internal::generic_scheduler::allocate_task (this=this@entry=0x7fffd548ce00, 
    number_of_bytes=number_of_bytes@entry=104, parent=parent@entry=0x0, context=0x7fffffff8cb0)
    at ../../src/tbb/scheduler.cpp:357
#1  0x00007fffdba7e63b in tbb::internal::allocate_root_with_context_proxy::allocate (this=0x7fffffff8ca8, size=104)
    at ../../src/tbb/task.cpp:64
#2  0x00007fffdba55d08 in tbb::interface7::internal::delegated_function<parallel_for(void*, char**, unsigned long*, unsigned long*, void*, unsigned long, unsigned long, int)::{lambda()#1} const, void>::operator()() const ()
   from <path>/numba/numba/np/ufunc/tbbpool.cpython-37m-x86_64-linux-gnu.so
#3  0x00007fffdba84250 in tbb::interface7::internal::task_arena_base::internal_execute (this=0x7fffffff9080, d=...)
    at ../../src/tbb/arena.cpp:1054
#4  0x00007fffdba5660e in parallel_for(void*, char**, unsigned long*, unsigned long*, void*, unsigned long, unsigned long, int)
    () from <path>/numba/numba/np/ufunc/tbbpool.cpython-37m-x86_64-linux-gnu.so
#5  0x00007fffa207e255 in ?? ()
#6  0x0000000000000005 in ?? ()
#7  0x0000000000000004 in ?? ()
#8  0x00007fffdbe83730 in ?? ()
#9  0x00007fffffff9308 in ?? ()
#10 0x0000000000006e11 in ?? ()
#11 0x000055556ee3b2d0 in ?? ()
#12 0x000055556d9d8aa0 in ?? ()
#13 0x00007fffffff9300 in ?? ()
#14 0x00007fffffff9248 in ?? ()
#15 0x000055556d9d8ae0 in ?? ()
#16 0x000055556ee3b300 in ?? ()
#17 0x000055556c2d2960 in ?? ()
#18 0x0000000000000009 in ?? ()
#19 0x000000000000000b in ?? ()
#20 0x0000000000000000 in ?? ()
(gdb) disassemble
Dump of assembler code for function tbb::internal::generic_scheduler::allocate_task(unsigned long, tbb::task*, tbb::task_group_context*):
   0x00007fffdba85710 <+0>:     push   %r14
   0x00007fffdba85712 <+2>:     push   %r13
   0x00007fffdba85714 <+4>:     mov    %rcx,%r13
   0x00007fffdba85717 <+7>:     push   %r12
   0x00007fffdba85719 <+9>:     mov    %rdx,%r12
   0x00007fffdba8571c <+12>:    push   %rbp
   0x00007fffdba8571d <+13>:    mov    %rdi,%rbp
   0x00007fffdba85720 <+16>:    push   %rbx
   0x00007fffdba85721 <+17>:    cmp    $0xc0,%rsi
   0x00007fffdba85728 <+24>:    ja     0x7fffdba857c0 <tbb::internal::generic_scheduler::allocate_task(unsigned long, tbb::task*, tbb::task_group_context*)+176>
   0x00007fffdba8572e <+30>:    mov    0x88(%rdi),%rbx
   0x00007fffdba85735 <+37>:    test   %rbx,%rbx
   0x00007fffdba85738 <+40>:    je     0x7fffdba85788 <tbb::internal::generic_scheduler::allocate_task(unsigned long, tbb::task*, tbb::task_group_context*)+120>
   0x00007fffdba8573a <+42>:    mov    -0x8(%rbx),%rax
   0x00007fffdba8573e <+46>:    mov    %rax,0x88(%rdi)
   0x00007fffdba85745 <+53>:    xor    %eax,%eax
   0x00007fffdba85747 <+55>:    mov    %r13,-0x38(%rbx)
   0x00007fffdba8574b <+59>:    mov    %ax,-0xa(%rbx)
   0x00007fffdba8574f <+63>:    mov    %rbx,%rax
   0x00007fffdba85752 <+66>:    mov    %rbp,-0x28(%rbx)
   0x00007fffdba85756 <+70>:    movq   $0x0,-0x18(%rbx)
   0x00007fffdba8575e <+78>:    movl   $0x0,-0x10(%rbx)
   0x00007fffdba85765 <+85>:    mov    %r12,-0x20(%rbx)
   0x00007fffdba85769 <+89>:    movb   $0x0,-0xb(%rbx)
   0x00007fffdba8576d <+93>:    movb   $0x3,-0xc(%rbx)
   0x00007fffdba85771 <+97>:    movq   $0x0,-0x40(%rbx)
   0x00007fffdba85779 <+105>:   pop    %rbx
   0x00007fffdba8577a <+106>:   pop    %rbp
   0x00007fffdba8577b <+107>:   pop    %r12
   0x00007fffdba8577d <+109>:   pop    %r13
   0x00007fffdba8577f <+111>:   pop    %r14
   0x00007fffdba85781 <+113>:   retq   
   0x00007fffdba85782 <+114>:   nopw   0x0(%rax,%rax,1)
   0x00007fffdba85788 <+120>:   cmpq   $0x0,0xb0(%rdi)
   0x00007fffdba85790 <+128>:   je     0x7fffdba857e8 <tbb::internal::generic_scheduler::allocate_task(unsigned long, tbb::task*, tbb::task_group_context*)+216>
   0x00007fffdba85792 <+130>:   lea    0xb0(%rdi),%rdi
   0x00007fffdba85799 <+137>:   xchg   %rbx,(%rdi)
   0x00007fffdba8579c <+140>:   lea    0x18b75(%rip),%rax        # 0x7fffdba9e318 <__itt_notify_sync_acquired_ptr__3_0>
   0x00007fffdba857a3 <+147>:   mov    (%rax),%rax
   0x00007fffdba857a6 <+150>:   mov    %rbx,%r14
   0x00007fffdba857a9 <+153>:   test   %rax,%rax
   0x00007fffdba857ac <+156>:   je     0x7fffdba857b0 <tbb::internal::generic_scheduler::allocate_task(unsigned long, tbb::task*, tbb::task_group_context*)+160>
   0x00007fffdba857ae <+158>:   callq  *%rax
=> 0x00007fffdba857b0 <+160>:   mov    -0x8(%r14),%rax
   0x00007fffdba857b4 <+164>:   mov    %rax,0x88(%rbp)
   0x00007fffdba857bb <+171>:   jmp    0x7fffdba85745 <tbb::internal::generic_scheduler::allocate_task(unsigned long, tbb::task*, tbb::task_group_context*)+53>
   0x00007fffdba857bd <+173>:   nopl   (%rax)
   0x00007fffdba857c0 <+176>:   add    $0x40,%rsi
   0x00007fffdba857c4 <+180>:   xor    %edx,%edx
   0x00007fffdba857c6 <+182>:   mov    $0x1,%edi
   0x00007fffdba857cb <+187>:   callq  *0x18407(%rip)        # 0x7fffdba9dbd8
   0x00007fffdba857d1 <+193>:   lea    0x40(%rax),%rbx
   0x00007fffdba857d5 <+197>:   movq   $0x0,0x10(%rax)
   0x00007fffdba857dd <+205>:   jmpq   0x7fffdba85745 <tbb::internal::generic_scheduler::allocate_task(unsigned long, tbb::task*, tbb::task_group_context*)+53>
   0x00007fffdba857e2 <+210>:   nopw   0x0(%rax,%rax,1)
   0x00007fffdba857e8 <+216>:   xor    %edx,%edx
   0x00007fffdba857ea <+218>:   mov    $0x100,%esi
   0x00007fffdba857ef <+223>:   mov    $0x1,%edi
   0x00007fffdba857f4 <+228>:   callq  *0x183de(%rip)        # 0x7fffdba9dbd8
   0x00007fffdba857fa <+234>:   mov    %rbp,0x10(%rax)
   0x00007fffdba857fe <+238>:   lea    0x40(%rax),%rbx
   0x00007fffdba85802 <+242>:   movq   $0x0,0x38(%rax)
   0x00007fffdba8580a <+250>:   addq   $0x1,0xa8(%rbp)
   0x00007fffdba85812 <+258>:   jmpq   0x7fffdba85745 <tbb::internal::generic_scheduler::allocate_task(unsigned long, tbb::task*, tbb::task_group_context*)+53>
End of assembler dump.
(gdb) p t
$1 = (tbb::task *) 0xffffffffffffffff
gdb) p my_return_list
$18 = (tbb::task *) 0x0
(gdb) p my_free_list
$19 = (tbb::task *) 0x0

the task, t, is 0xffffffffffffffff which is highly suspicious. Looking at the machine code:

   0x00007fffdba8579c <+140>:   lea    0x18b75(%rip),%rax        # 0x7fffdba9e318 <__itt_notify_sync_acquired_ptr__3_0>
   0x00007fffdba857a3 <+147>:   mov    (%rax),%rax
   0x00007fffdba857a6 <+150>:   mov    %rbx,%r14
   0x00007fffdba857a9 <+153>:   test   %rax,%rax
   0x00007fffdba857ac <+156>:   je     0x7fffdba857b0 <tbb::internal::generic_scheduler::allocate_task(unsigned long, tbb::task*, tbb::task_group_context*)+160>
   0x00007fffdba857ae <+158>:   callq  *%rax
=> 0x00007fffdba857b0 <+160>:   mov    -0x8(%r14),%rax

to get to the location that segfaults, seems like the branch containing __itt_notify_sync_acquired_ptr__3_0 must be taken as there's no jump to it from elsewhere. Looking at the C++ source, this has been reached as a result of if( (t = my_free_list) ) { evaluating false, and } else if( my_return_list ) { evaluating true. Which is strange as t is ~0x0, my_free_list is 0x0 and my_return_list is 0x0.

Running this under valgrind.

@stuartarchibald
Copy link
Contributor Author

stuartarchibald commented Aug 26, 2020

Valgrind says:

==15728== Invalid read of size 8
==15728==    at 0x23D9C7B0: tbb::internal::generic_scheduler::allocate_task(unsigned long, tbb::task*, tbb::task_group_context*) (scheduler.cpp:357)
==15728==    by 0x23D9563A: tbb::internal::allocate_root_with_context_proxy::allocate(unsigned long) const (task.cpp:64)
==15728==    by 0x23DBFD07: tbb::interface7::internal::delegated_function<parallel_for(void*, char**, unsigned long*, unsigned long*, void*, unsigned long, unsigned long, int)::{lambda()#1} const, void>::operator()() const (in <path>/numba/numba/np/ufunc/tbbpool.cpython-37m-x86_64-linux-gnu.so)
==15728==    by 0x23D9B24F: tbb::interface7::internal::task_arena_base::internal_execute(tbb::interface7::internal::delegate_base&) const (arena.cpp:1054)
==15728==    by 0x23DC060D: parallel_for(void*, char**, unsigned long*, unsigned long*, void*, unsigned long, unsigned long, int) (in <path>/numba/numba/np/ufunc/tbbpool.cpython-37m-x86_64-linux-gnu.so)
==15728==    by 0x6C5FF22D: ???
==15728==    by 0x4: ???
==15728==    by 0x3: ???
==15728==    by 0x235736DF: ???
==15728==    by 0x1FFEFFB277: ???
==15728==    by 0x25E752FF: ???
==15728==    by 0x6EC6A9EF: ???
==15728==  Address 0xfffffffffffffff7 is not stack'd, malloc'd or (recently) free'd
==

gdb is pointing at this:

(gdb) disassemble
<snip>
=> 0x0000000023d9c7b0 <+160>:   mov    -0x8(%r14),%rax
(gdb) info registers 
rax            0x0      0
rbx            0xffffffffffffffff       -1
rcx            0x1ffeffac20     137422154784
rdx            0x0      0
rsi            0x68     104
rdi            0x2afc8eb0       721194672
rbp            0x2afc8e00       0x2afc8e00
rsp            0x1ffeffab90     0x1ffeffab90
r8             0x2b     43
r9             0x31     49
r10            0x0      0
r11            0x0      0
r12            0x0      0
r13            0x1ffeffac20     137422154784
r14            0xffffffffffffffff       -1
r15            0x6      6
rip            0x23d9c7b0       0x23d9c7b0 <tbb::internal::generic_scheduler::allocate_task(unsigned long, tbb::task*, tbb::task_group_context*)+160>
eflags         0x44     [ PF ZF ]
cs             0x0      0
ss             0x0      0
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0

looks like %r14 holds the tbb::task:

(gdb) p t
$4 = (tbb::task *) 0xffffffffffffffff 

and obviously -0x8(%r14) = 0xfffffffffffffff7, which isn't mapped.

Would hazard a guess at this failing to provide a task:
https://github.com/oneapi-src/oneTBB/blob/tbb_2020/src/tbb/scheduler.cpp#L355
as

Think this provides a task and then zeros, the task is just nonsense.

(gdb) p this->my_return_list
$23 = (tbb::task *) 0x0 <numba.dynamic.globals.0>

@alexey-katranov
Copy link

the task, t, is 0xffffffffffffffff which is highly suspicious

Yes, it is really suspicious. The only place where my_free_list can become -1 is the scheduler cleanup. So, it seems like we allocate from the scheduler that was already destroyed. The scheduler can be destroyed when either the thread is destroyed or market/rml is destroyed. It seems we have the second case (because we still have the thread associated with this scheduler), i.e. TBB is in shutdown state.

Is the main thread alive? or maybe TBB is unloaded with dlclose?

@Hardcode84
Copy link

If it fails only in multiprocess mode it may be connected to tbbpool fork handling. Does it create new processes via fork+exec or just fork?

@stuartarchibald
Copy link
Contributor Author

@alexey-katranov Thanks for taking a look at this and providing insight into what could be going on. I'll trace this path and see if anything hints that that is the cause.

@Hardcode84 This is a good point, I tested this further. Turns out that the multiprocessing test mode is a red herring, it just so happens that the tests required to trigger this issue are actually part of the "must be run in serial" section of the testing. Also turns out that the part of the code doing that multiprocessing serial execution part happened to shuffle the test ordering slightly compared to the non-multiprocessing order, and it is the specific order that's triggering the problem.

Good news is that after much debugging, I have a reproducer:

from numba import njit, prange
import numpy as np
import multiprocessing

@njit
def foo(x):
    pass

@njit(parallel=True)
def bar():
    x = 0
    # What this is doesn't matter, just needs to have parallel semantics
    # so that TBB loads and something is executed in the thread pool
    for i in prange(3):
        x += i
    return 1

def _main(pool):

    bar()

    pool.map(foo, [None])

    bar()

if __name__ == "__main__":
    ctx = multiprocessing.get_context('spawn')
    pool = ctx.Pool(1)
    _main(pool)

stuartarchibald added a commit to stuartarchibald/numba that referenced this issue Sep 1, 2020
stuartarchibald added a commit to stuartarchibald/numba that referenced this issue Sep 1, 2020
@stuartarchibald
Copy link
Contributor Author

This was fixed in #6208, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug - segfault Bugs that cause SIGSEGV, SIGABRT, SIGILL, SIGBUS bug threading Issue involving the threading layers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants