-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in TClingCallbacks::findInGlobalModuleIndex() #44659
Comments
assign core |
type root |
New categories assigned: core @Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks |
cms-bot internal usage |
A new Issue was created by @makortel. @rappoccio, @antoniovilela, @Dr15Jones, @makortel, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
FYI @pcanal @vgvassilev I'm not sure if this has any relation to the other random crashes we're seeing. I don't recall seeing it before. |
It says that it crashes within the
The argument passed to Process::GetEnv is a hard-coded string in this case. There is some manipulation in there, but still I don't see anything obvious leading to the crash :( |
Do you know if LLVM uses the POSIX |
Fair point. It is |
Of course this observation probably means we have something else calling |
maybe related, probably a clue, segfault on el8_amd64_gcc12 CMSSW_14_1_NONLTO_X_2024-04-20-1100 12634.0 step2:
|
This latest crash seems to be due to a 'corrupted' |
Another possible occurance: RelVal 12634.0 in CMSSW_14_1_CLANG_X_2024-08-06-2300:
|
Are we in a multithreaded environment? I do not think |
Yes, all of these are multithreaded.
Good catch! Indeed C |
@vgvassilev @makortel This still happens: slc7_amd64_gcc12/CMSSW_14_2_X_2024-09-12-1100. |
Posix call to ::getenv is not guarenteed to be thread safe while C++11 made std::getenv thread safe. This resolves bugs when using llvm in multithreaded environment similar to cms-sw/cmssw#44659
I've opened a PR against LLVM. Let's see how this goes and will backport the patch to ROOT. Would that work? |
Following the discussion in llvm/llvm-project#108529 (llvm/llvm-project#108529 (comment) and llvm/llvm-project#108529 (comment) in particular), and quick web searches suggesting glibc's Could the problem be the environment being modified? Already a quick |
Running the step2 of 12861.0 (from the latest report #44659 (comment)) through |
Here is the new issue #46002 |
Can you paste both stack traces? |
Being lazy I only recorded the stack trace of the thread that called the |
I do not see what we can do on the ROOT side. Is avoiding |
I think avoiding |
While trying to craft a gdb script to catch
Is this |
I have a fixme there but unfortunately I do not see an easy way to get rid of it in short term https://github.com/root-project/root/blob/91bb4d73ef984fe8b3327e989e9cdba641078098/core/metacling/src/TCling.cxx#L1442-L1443 Do we run the initialization of root in multithreaded context? |
This happens where |
A PR removing the |
I think we have reached the point we can close this issue. |
+core |
This issue is fully signed and ready to be closed. |
Workflow 141.035 segfaulted in CMSSW_14_1_X_2024-04-07-2300 on slc7_amd64_gcc12 with
https://cmssdt.cern.ch/SDT/cgi-bin/logreader/slc7_amd64_gcc12/CMSSW_14_1_X_2024-04-07-2300/pyRelValMatrixLogs/run/141.035_RunDisplacedJet2023C/step2_RunDisplacedJet2023C.log#/
The text was updated successfully, but these errors were encountered: