-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in TBB, maybe parfors related. #5973
Comments
FWIW TBB caused us some issues with |
Potential leak: opencv/opencv#6843 |
Related? https://community.intel.com/t5/Intel-oneAPI-Threading-Building/Segmentation-Fault-caused-by-parallel-for/m-p/1085795#M13548 I thought however we'd been down the route of getting TBB to build with |
Don't think it's the above, think this may be corruption. The program segfaults in this source as follows:
the task,
to get to the location that segfaults, seems like the branch containing Running this under valgrind. |
Valgrind says:
gdb is pointing at this:
looks like
and obviously
|
Yes, it is really suspicious. The only place where Is the main thread alive? or maybe TBB is unloaded with dlclose? |
If it fails only in multiprocess mode it may be connected to tbbpool fork handling. Does it create new processes via |
@alexey-katranov Thanks for taking a look at this and providing insight into what could be going on. I'll trace this path and see if anything hints that that is the cause. @Hardcode84 This is a good point, I tested this further. Turns out that the multiprocessing test mode is a red herring, it just so happens that the tests required to trigger this issue are actually part of the "must be run in serial" section of the testing. Also turns out that the part of the code doing that multiprocessing serial execution part happened to shuffle the test ordering slightly compared to the non-multiprocessing order, and it is the specific order that's triggering the problem. Good news is that after much debugging, I have a reproducer: from numba import njit, prange
import numpy as np
import multiprocessing
@njit
def foo(x):
pass
@njit(parallel=True)
def bar():
x = 0
# What this is doesn't matter, just needs to have parallel semantics
# so that TBB loads and something is executed in the thread pool
for i in prange(3):
x += i
return 1
def _main(pool):
bar()
pool.map(foo, [None])
bar()
if __name__ == "__main__":
ctx = multiprocessing.get_context('spawn')
pool = ctx.Pool(1)
_main(pool) |
As title.
As title.
This was fixed in #6208, closing. |
Reporting a bug
There's a problem in this branch https://github.com/stuartarchibald/numba/tree/pr_5463_segfault, head is c84bfbd, it's a clone of PR #5463 which is segfaulting on CI. It's reproducible locally with:
and in fact any
-m <X>
will segfault. However, this:will not. The difference here being that the use of the multiprocessing test runner (or not, as in the latter case).
Backtrace is:
The text was updated successfully, but these errors were encountered: