-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using debug jemalloc version with CMSSW_12_0_2 #35729
Comments
A new Issue was created by @smorovic Srecko Morovic. @Dr15Jones, @perrotta, @dpiparo, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign core |
New categories assigned: core @Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks |
Valgrind is the best tool we have for finding memory issues. |
@smorovic , regard your question about using jemalloc-debug , yes updating config/toolbox/slc7_amd64_gcc900/tools/selected/jemalloc.xml ( just update the JEMALLOC_BASE to point to jemalloc-debug install location) and scram setup + cmsenv is enough to for cmsRun to start using jemalloc debug |
by the way, cms-sw/cmsdist#7396 should fix the issue with different versions used for jemalloc and jemalloc-debug |
Thank you @smuzaffar, after merging your PR< will this be reflected in the next release (e.g. 12_0_3)? The idea was to be able to run full HLT, on around 200 nodes, tooled with debug jemalloc, since crashes are not easily reproducible, but spurious and were occurring in bursts every few hours (although the issue could still be present in all instances, depending on probability to crash) - in case it will happen again. Valgrind is difficult to use in this way. I can try to run it manually for a single process, if it's not too slow to keep up with the rest of HLT. |
it will first go in 12.1.X and then we will be backport it to 12.0.X ( will take couple days before it is available in 12.0.X IBs). |
cms-sw/cmsdist#7399 has been merged in 12.0.X, next cmssw 12.0 release will include it |
Thank you, @smuzaffar for sorting this out. |
Hello,
I would like to retool CMSSW to a debug jemalloc version to investigate memory corruption issues seen during 15-17 Oct in HLT (although crashes have, as of now, disappeared after Monday morning after appearing all wekend). Crashes are possibly related to condition data, since there have been coinciding with network issues, but we are not certain about it. Still we'd like to have tools ready to debug in case they appear again.
It was suggested by frontier experts, which helped with investigation since many crashes appeared in frontier client code, to turn on debugging in allocator.
I see that CMSSW 12_0_2 (slc7_amd64_gcc900) currently installs jemalloc v5.2.1, but also a debug version 4.5.0:
external+jemalloc+5.2.1-cms
external+jemalloc-debug+4.5.0-cms
Are debug builds suitable for inspecting memory corruption issues, for example is the thread caching disabled (
MALLOC_CONF=tcache:false
) as described here?:https://github.com/jemalloc/jemalloc/wiki/Use-Case%3A-Find-a-memory-corruption-bug
Since debug version is older, can it be user in place of 5.2.1? If this is not the case, I would like to ask for a debug build of 5.2.1.
Regarding how to switch the allocator, is it sufficient to just do modifications in xml file
config/toolbox/slc7_amd64_gcc900/tools/selected/jemalloc.xml
+ scram setup or is it more involved?Best regards,
Srecko Morovic
The text was updated successfully, but these errors were encountered: