Segfault in TBB, maybe parfors related. #5973

stuartarchibald · 2020-07-09T15:39:21Z

Reporting a bug

There's a problem in this branch https://github.com/stuartarchibald/numba/tree/pr_5463_segfault, head is c84bfbd, it's a clone of PR #5463 which is segfaulting on CI. It's reproducible locally with:

./runtests.py -m 1 -v -b -j "2,None,18" --exclude-tags='long_running' numba.tests

and in fact any -m <X> will segfault. However, this:

./runtests.py -v -b -j "2,None,18" --exclude-tags='long_running' numba.tests

will not. The difference here being that the use of the multiprocessing test runner (or not, as in the latter case).

Backtrace is:

test_issue4963_globals (numba.tests.test_parfors.TestParfors) ... 
Program received signal SIGSEGV, Segmentation fault.
tbb::internal::generic_scheduler::allocate_task (this=this@entry=0x7fffd60b2400, number_of_bytes=number_of_bytes@entry=104, 
    parent=parent@entry=0x0, context=0x7fffffff8b80) at ../../src/tbb/scheduler.cpp:357
357     ../../src/tbb/scheduler.cpp: No such file or directory.
(gdb) bt
#0  tbb::internal::generic_scheduler::allocate_task (this=this@entry=0x7fffd60b2400, 
    number_of_bytes=number_of_bytes@entry=104, parent=parent@entry=0x0, context=0x7fffffff8b80)
    at ../../src/tbb/scheduler.cpp:357
#1  0x00007fffdbe6363b in tbb::internal::allocate_root_with_context_proxy::allocate (this=0x7fffffff8b78, size=104)
    at ../../src/tbb/task.cpp:64
#2  0x00007fffdbe3cd08 in tbb::interface7::internal::delegated_function<parallel_for(void*, char**, unsigned long*, unsigned long*, void*, unsigned long, unsigned long, int)::{lambda()#1} const, void>::operator()() const ()
   from <numba_path>/numba/numba/np/ufunc/tbbpool.cpython-37m-x86_64-linux-gnu.so
#3  0x00007fffdbe69250 in tbb::interface7::internal::task_arena_base::internal_execute (this=0x7fffffff8f50, d=...)
    at ../../src/tbb/arena.cpp:1054
#4  0x00007fffdbe3d60e in parallel_for(void*, char**, unsigned long*, unsigned long*, void*, unsigned long, unsigned long, int)
    () from <numba_path>/numba/numba/np/ufunc/tbbpool.cpython-37m-x86_64-linux-gnu.so
#5  0x00007ffff7e0c17e in numba::tests::test_parfors::TestParfors::test_issue4963_globals::$3clocals$3e::test_impl$24118 ()
#6  0x00007ffff7e0c2fc in cpython::numba::tests::test_parfors::TestParfors::test_issue4963_globals::$3clocals$3e::test_impl$24118 ()
#7  0x000055555568be05 in cfunction_call_varargs (kwargs=0x0, args=args@entry=0x0, func=0x7fffd1d9ac80)
    at /tmp/build/80754af9/python_1585235023510/work/Objects/call.c:757

The text was updated successfully, but these errors were encountered:

jakirkham · 2020-08-18T06:23:53Z

FWIW TBB caused us some issues with aarch64 builds in conda-forge recently as well.

stuartarchibald · 2020-08-26T08:48:38Z

Potential leak: opencv/opencv#6843

stuartarchibald · 2020-08-26T08:51:54Z

Related? https://community.intel.com/t5/Intel-oneAPI-Threading-Building/Segmentation-Fault-caused-by-parallel-for/m-p/1085795#M13548 I thought however we'd been down the route of getting TBB to build with -flifetime-dse=1 etc.

stuartarchibald · 2020-08-26T13:57:15Z

Don't think it's the above, think this may be corruption.

The program segfaults in this source as follows:

Program received signal SIGSEGV, Segmentation fault.
tbb::internal::generic_scheduler::allocate_task (this=this@entry=0x7fffd548ce00, number_of_bytes=number_of_bytes@entry=104, 
    parent=parent@entry=0x0, context=0x7fffffff8cb0) at ../../src/tbb/scheduler.cpp:357
357     ../../src/tbb/scheduler.cpp: No such file or directory.
(gdb) bt
#0  tbb::internal::generic_scheduler::allocate_task (this=this@entry=0x7fffd548ce00, 
    number_of_bytes=number_of_bytes@entry=104, parent=parent@entry=0x0, context=0x7fffffff8cb0)
    at ../../src/tbb/scheduler.cpp:357
#1  0x00007fffdba7e63b in tbb::internal::allocate_root_with_context_proxy::allocate (this=0x7fffffff8ca8, size=104)
    at ../../src/tbb/task.cpp:64
#2  0x00007fffdba55d08 in tbb::interface7::internal::delegated_function<parallel_for(void*, char**, unsigned long*, unsigned long*, void*, unsigned long, unsigned long, int)::{lambda()#1} const, void>::operator()() const ()
   from <path>/numba/numba/np/ufunc/tbbpool.cpython-37m-x86_64-linux-gnu.so
#3  0x00007fffdba84250 in tbb::interface7::internal::task_arena_base::internal_execute (this=0x7fffffff9080, d=...)
    at ../../src/tbb/arena.cpp:1054
#4  0x00007fffdba5660e in parallel_for(void*, char**, unsigned long*, unsigned long*, void*, unsigned long, unsigned long, int)
    () from <path>/numba/numba/np/ufunc/tbbpool.cpython-37m-x86_64-linux-gnu.so
#5  0x00007fffa207e255 in ?? ()
#6  0x0000000000000005 in ?? ()
#7  0x0000000000000004 in ?? ()
#8  0x00007fffdbe83730 in ?? ()
#9  0x00007fffffff9308 in ?? ()
#10 0x0000000000006e11 in ?? ()
#11 0x000055556ee3b2d0 in ?? ()
#12 0x000055556d9d8aa0 in ?? ()
#13 0x00007fffffff9300 in ?? ()
#14 0x00007fffffff9248 in ?? ()
#15 0x000055556d9d8ae0 in ?? ()
#16 0x000055556ee3b300 in ?? ()
#17 0x000055556c2d2960 in ?? ()
#18 0x0000000000000009 in ?? ()
#19 0x000000000000000b in ?? ()
#20 0x0000000000000000 in ?? ()
(gdb) disassemble
Dump of assembler code for function tbb::internal::generic_scheduler::allocate_task(unsigned long, tbb::task*, tbb::task_group_context*):
   0x00007fffdba85710 <+0>:     push   %r14
   0x00007fffdba85712 <+2>:     push   %r13
   0x00007fffdba85714 <+4>:     mov    %rcx,%r13
   0x00007fffdba85717 <+7>:     push   %r12
   0x00007fffdba85719 <+9>:     mov    %rdx,%r12
   0x00007fffdba8571c <+12>:    push   %rbp
   0x00007fffdba8571d <+13>:    mov    %rdi,%rbp
   0x00007fffdba85720 <+16>:    push   %rbx
   0x00007fffdba85721 <+17>:    cmp    $0xc0,%rsi
   0x00007fffdba85728 <+24>:    ja     0x7fffdba857c0 <tbb::internal::generic_scheduler::allocate_task(unsigned long, tbb::task*, tbb::task_group_context*)+176>
   0x00007fffdba8572e <+30>:    mov    0x88(%rdi),%rbx
   0x00007fffdba85735 <+37>:    test   %rbx,%rbx
   0x00007fffdba85738 <+40>:    je     0x7fffdba85788 <tbb::internal::generic_scheduler::allocate_task(unsigned long, tbb::task*, tbb::task_group_context*)+120>
   0x00007fffdba8573a <+42>:    mov    -0x8(%rbx),%rax
   0x00007fffdba8573e <+46>:    mov    %rax,0x88(%rdi)
   0x00007fffdba85745 <+53>:    xor    %eax,%eax
   0x00007fffdba85747 <+55>:    mov    %r13,-0x38(%rbx)
   0x00007fffdba8574b <+59>:    mov    %ax,-0xa(%rbx)
   0x00007fffdba8574f <+63>:    mov    %rbx,%rax
   0x00007fffdba85752 <+66>:    mov    %rbp,-0x28(%rbx)
   0x00007fffdba85756 <+70>:    movq   $0x0,-0x18(%rbx)
   0x00007fffdba8575e <+78>:    movl   $0x0,-0x10(%rbx)
   0x00007fffdba85765 <+85>:    mov    %r12,-0x20(%rbx)
   0x00007fffdba85769 <+89>:    movb   $0x0,-0xb(%rbx)
   0x00007fffdba8576d <+93>:    movb   $0x3,-0xc(%rbx)
   0x00007fffdba85771 <+97>:    movq   $0x0,-0x40(%rbx)
   0x00007fffdba85779 <+105>:   pop    %rbx
   0x00007fffdba8577a <+106>:   pop    %rbp
   0x00007fffdba8577b <+107>:   pop    %r12
   0x00007fffdba8577d <+109>:   pop    %r13
   0x00007fffdba8577f <+111>:   pop    %r14
   0x00007fffdba85781 <+113>:   retq   
   0x00007fffdba85782 <+114>:   nopw   0x0(%rax,%rax,1)
   0x00007fffdba85788 <+120>:   cmpq   $0x0,0xb0(%rdi)
   0x00007fffdba85790 <+128>:   je     0x7fffdba857e8 <tbb::internal::generic_scheduler::allocate_task(unsigned long, tbb::task*, tbb::task_group_context*)+216>
   0x00007fffdba85792 <+130>:   lea    0xb0(%rdi),%rdi
   0x00007fffdba85799 <+137>:   xchg   %rbx,(%rdi)
   0x00007fffdba8579c <+140>:   lea    0x18b75(%rip),%rax        # 0x7fffdba9e318 <__itt_notify_sync_acquired_ptr__3_0>
   0x00007fffdba857a3 <+147>:   mov    (%rax),%rax
   0x00007fffdba857a6 <+150>:   mov    %rbx,%r14
   0x00007fffdba857a9 <+153>:   test   %rax,%rax
   0x00007fffdba857ac <+156>:   je     0x7fffdba857b0 <tbb::internal::generic_scheduler::allocate_task(unsigned long, tbb::task*, tbb::task_group_context*)+160>
   0x00007fffdba857ae <+158>:   callq  *%rax
=> 0x00007fffdba857b0 <+160>:   mov    -0x8(%r14),%rax
   0x00007fffdba857b4 <+164>:   mov    %rax,0x88(%rbp)
   0x00007fffdba857bb <+171>:   jmp    0x7fffdba85745 <tbb::internal::generic_scheduler::allocate_task(unsigned long, tbb::task*, tbb::task_group_context*)+53>
   0x00007fffdba857bd <+173>:   nopl   (%rax)
   0x00007fffdba857c0 <+176>:   add    $0x40,%rsi
   0x00007fffdba857c4 <+180>:   xor    %edx,%edx
   0x00007fffdba857c6 <+182>:   mov    $0x1,%edi
   0x00007fffdba857cb <+187>:   callq  *0x18407(%rip)        # 0x7fffdba9dbd8
   0x00007fffdba857d1 <+193>:   lea    0x40(%rax),%rbx
   0x00007fffdba857d5 <+197>:   movq   $0x0,0x10(%rax)
   0x00007fffdba857dd <+205>:   jmpq   0x7fffdba85745 <tbb::internal::generic_scheduler::allocate_task(unsigned long, tbb::task*, tbb::task_group_context*)+53>
   0x00007fffdba857e2 <+210>:   nopw   0x0(%rax,%rax,1)
   0x00007fffdba857e8 <+216>:   xor    %edx,%edx
   0x00007fffdba857ea <+218>:   mov    $0x100,%esi
   0x00007fffdba857ef <+223>:   mov    $0x1,%edi
   0x00007fffdba857f4 <+228>:   callq  *0x183de(%rip)        # 0x7fffdba9dbd8
   0x00007fffdba857fa <+234>:   mov    %rbp,0x10(%rax)
   0x00007fffdba857fe <+238>:   lea    0x40(%rax),%rbx
   0x00007fffdba85802 <+242>:   movq   $0x0,0x38(%rax)
   0x00007fffdba8580a <+250>:   addq   $0x1,0xa8(%rbp)
   0x00007fffdba85812 <+258>:   jmpq   0x7fffdba85745 <tbb::internal::generic_scheduler::allocate_task(unsigned long, tbb::task*, tbb::task_group_context*)+53>
End of assembler dump.
(gdb) p t
$1 = (tbb::task *) 0xffffffffffffffff
gdb) p my_return_list
$18 = (tbb::task *) 0x0
(gdb) p my_free_list
$19 = (tbb::task *) 0x0

the task, t, is 0xffffffffffffffff which is highly suspicious. Looking at the machine code:

   0x00007fffdba8579c <+140>:   lea    0x18b75(%rip),%rax        # 0x7fffdba9e318 <__itt_notify_sync_acquired_ptr__3_0>
   0x00007fffdba857a3 <+147>:   mov    (%rax),%rax
   0x00007fffdba857a6 <+150>:   mov    %rbx,%r14
   0x00007fffdba857a9 <+153>:   test   %rax,%rax
   0x00007fffdba857ac <+156>:   je     0x7fffdba857b0 <tbb::internal::generic_scheduler::allocate_task(unsigned long, tbb::task*, tbb::task_group_context*)+160>
   0x00007fffdba857ae <+158>:   callq  *%rax
=> 0x00007fffdba857b0 <+160>:   mov    -0x8(%r14),%rax

to get to the location that segfaults, seems like the branch containing __itt_notify_sync_acquired_ptr__3_0 must be taken as there's no jump to it from elsewhere. Looking at the C++ source, this has been reached as a result of if( (t = my_free_list) ) { evaluating false, and } else if( my_return_list ) { evaluating true. Which is strange as t is ~0x0, my_free_list is 0x0 and my_return_list is 0x0.

Running this under valgrind.

stuartarchibald · 2020-08-26T15:57:23Z

Valgrind says:

==15728== Invalid read of size 8
==15728==    at 0x23D9C7B0: tbb::internal::generic_scheduler::allocate_task(unsigned long, tbb::task*, tbb::task_group_context*) (scheduler.cpp:357)
==15728==    by 0x23D9563A: tbb::internal::allocate_root_with_context_proxy::allocate(unsigned long) const (task.cpp:64)
==15728==    by 0x23DBFD07: tbb::interface7::internal::delegated_function<parallel_for(void*, char**, unsigned long*, unsigned long*, void*, unsigned long, unsigned long, int)::{lambda()#1} const, void>::operator()() const (in <path>/numba/numba/np/ufunc/tbbpool.cpython-37m-x86_64-linux-gnu.so)
==15728==    by 0x23D9B24F: tbb::interface7::internal::task_arena_base::internal_execute(tbb::interface7::internal::delegate_base&) const (arena.cpp:1054)
==15728==    by 0x23DC060D: parallel_for(void*, char**, unsigned long*, unsigned long*, void*, unsigned long, unsigned long, int) (in <path>/numba/numba/np/ufunc/tbbpool.cpython-37m-x86_64-linux-gnu.so)
==15728==    by 0x6C5FF22D: ???
==15728==    by 0x4: ???
==15728==    by 0x3: ???
==15728==    by 0x235736DF: ???
==15728==    by 0x1FFEFFB277: ???
==15728==    by 0x25E752FF: ???
==15728==    by 0x6EC6A9EF: ???
==15728==  Address 0xfffffffffffffff7 is not stack'd, malloc'd or (recently) free'd
==

gdb is pointing at this:

(gdb) disassemble
<snip>
=> 0x0000000023d9c7b0 <+160>:   mov    -0x8(%r14),%rax
(gdb) info registers 
rax            0x0      0
rbx            0xffffffffffffffff       -1
rcx            0x1ffeffac20     137422154784
rdx            0x0      0
rsi            0x68     104
rdi            0x2afc8eb0       721194672
rbp            0x2afc8e00       0x2afc8e00
rsp            0x1ffeffab90     0x1ffeffab90
r8             0x2b     43
r9             0x31     49
r10            0x0      0
r11            0x0      0
r12            0x0      0
r13            0x1ffeffac20     137422154784
r14            0xffffffffffffffff       -1
r15            0x6      6
rip            0x23d9c7b0       0x23d9c7b0 <tbb::internal::generic_scheduler::allocate_task(unsigned long, tbb::task*, tbb::task_group_context*)+160>
eflags         0x44     [ PF ZF ]
cs             0x0      0
ss             0x0      0
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0

looks like %r14 holds the tbb::task:

(gdb) p t
$4 = (tbb::task *) 0xffffffffffffffff

and obviously -0x8(%r14) = 0xfffffffffffffff7, which isn't mapped.

Would hazard a guess at this failing to provide a task:
https://github.com/oneapi-src/oneTBB/blob/tbb_2020/src/tbb/scheduler.cpp#L355
as
Think this provides a task and then zeros, the task is just nonsense.

(gdb) p this->my_return_list
$23 = (tbb::task *) 0x0 <numba.dynamic.globals.0>

alexey-katranov · 2020-08-27T19:36:07Z

the task, t, is 0xffffffffffffffff which is highly suspicious

Yes, it is really suspicious. The only place where my_free_list can become -1 is the scheduler cleanup. So, it seems like we allocate from the scheduler that was already destroyed. The scheduler can be destroyed when either the thread is destroyed or market/rml is destroyed. It seems we have the second case (because we still have the thread associated with this scheduler), i.e. TBB is in shutdown state.

Is the main thread alive? or maybe TBB is unloaded with dlclose?

Hardcode84 · 2020-08-31T19:10:15Z

If it fails only in multiprocess mode it may be connected to tbbpool fork handling. Does it create new processes via fork+exec or just fork?

stuartarchibald · 2020-09-01T11:20:00Z

@alexey-katranov Thanks for taking a look at this and providing insight into what could be going on. I'll trace this path and see if anything hints that that is the cause.

@Hardcode84 This is a good point, I tested this further. Turns out that the multiprocessing test mode is a red herring, it just so happens that the tests required to trigger this issue are actually part of the "must be run in serial" section of the testing. Also turns out that the part of the code doing that multiprocessing serial execution part happened to shuffle the test ordering slightly compared to the non-multiprocessing order, and it is the specific order that's triggering the problem.

Good news is that after much debugging, I have a reproducer:

from numba import njit, prange
import numpy as np
import multiprocessing

@njit
def foo(x):
    pass

@njit(parallel=True)
def bar():
    x = 0
    # What this is doesn't matter, just needs to have parallel semantics
    # so that TBB loads and something is executed in the thread pool
    for i in prange(3):
        x += i
    return 1

def _main(pool):

    bar()

    pool.map(foo, [None])

    bar()

if __name__ == "__main__":
    ctx = multiprocessing.get_context('spawn')
    pool = ctx.Pool(1)
    _main(pool)

As title.

Workaround #5973

stuartarchibald · 2021-04-23T16:09:28Z

This was fixed in #6208, closing.

stuartarchibald added the bug label Jul 9, 2020

stuartarchibald mentioned this issue Jul 10, 2020

Add str(int) impl #5463

Merged

stuartarchibald added bug - segfault Bugs that cause SIGSEGV, SIGABRT, SIGILL, SIGBUS threading Issue involving the threading layers labels Aug 12, 2020

jakirkham mentioned this issue Aug 17, 2020

numba v0.51.0 conda-forge/numba-feedstock#56

Merged

3 tasks

stuartarchibald mentioned this issue Aug 26, 2020

Remove Python 2 compatibility from numba.core.utils #6166

Merged

This was referenced Aug 26, 2020

fix version_info if version can not be determined #6126

Merged

Add np.asfarray impl #5418

Merged

gmarkall mentioned this issue Aug 27, 2020

Remove references to deprecated numpy globals #6176

Merged

This was referenced Aug 29, 2020

Better error message on NotDefinedError #6185

Merged

[WIP] Don't visit already checked successors #6187

Closed

stuartarchibald added a commit to stuartarchibald/numba that referenced this issue Sep 1, 2020

Work around numba#5973

26c9b9b

As title.

stuartarchibald added a commit to stuartarchibald/numba that referenced this issue Sep 1, 2020

Workaround numba#5973

813c3ad

As title.

sklam added a commit that referenced this issue Sep 1, 2020

Merge pull request #6200 from stuartarchibald/wip/work_around_5973

d58a63a

Workaround #5973

Hardcode84 mentioned this issue Sep 2, 2020

Do not try to terminate TBB if fork was called not from main thread #6208

Closed

stuartarchibald closed this as completed Apr 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfault in TBB, maybe parfors related. #5973

Segfault in TBB, maybe parfors related. #5973

stuartarchibald commented Jul 9, 2020

jakirkham commented Aug 18, 2020

stuartarchibald commented Aug 26, 2020

stuartarchibald commented Aug 26, 2020

stuartarchibald commented Aug 26, 2020

stuartarchibald commented Aug 26, 2020 •

edited

Loading

alexey-katranov commented Aug 27, 2020

Hardcode84 commented Aug 31, 2020

stuartarchibald commented Sep 1, 2020

stuartarchibald commented Apr 23, 2021

Segfault in TBB, maybe parfors related. #5973

Segfault in TBB, maybe parfors related. #5973

Comments

stuartarchibald commented Jul 9, 2020

Reporting a bug

jakirkham commented Aug 18, 2020

stuartarchibald commented Aug 26, 2020

stuartarchibald commented Aug 26, 2020

stuartarchibald commented Aug 26, 2020

stuartarchibald commented Aug 26, 2020 • edited Loading

alexey-katranov commented Aug 27, 2020

Hardcode84 commented Aug 31, 2020

stuartarchibald commented Sep 1, 2020

stuartarchibald commented Apr 23, 2021

stuartarchibald commented Aug 26, 2020 •

edited

Loading