-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allocating a host buffer in a dev.cc file causes a crash at the end of the job #42414
Comments
assign heterogeneous |
A new Issue was created by @fwyzard Andrea Bocci. @Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
Enabling debugging information for the #ifdef ALPAKA_ACC_GPU_CUDA_ENABLED
debug_(std::is_same_v<Device, alpaka::DevCpu> and std::is_same_v<Queue, alpaka::QueueCudaRtNonBlocking>)
#else
debug_(false)
#endif and instrumenting the void freeAllCached() {
std::scoped_lock lock(mutex_);
if (debug_)
std::cout << alpaka::core::demangled<CachingAllocator<Device, Queue, void>> << "::freeAllCached()" << " - start" << std::endl;
...
if (debug_)
std::cout << alpaka::core::demangled<CachingAllocator<Device, Queue, void>> << "::freeAllCached()" << " - done" << std::endl;
} highlights that the problem is caused by
The first time it does not find any blocks to release. |
Running with GDB
shows that the first time it is called by the destructor of the corresponding
The second time it's called by the destructor of the
Both calls are expected. What is unexpected is that the first call does not find any blocks to release, and that the second one does. |
Going through the logs and adding more debugging information shows that there in fact two instances of the The first one is initialised by the
A second one is initialised by the call to
At the end of the job, the instance known to the
Instead, it is the second instance that crashes while trying to release its memory blocks:
This starts to make sense: the |
Instances of a namespace cms::alpakatools {
template <typename TQueue, typename = std::enable_if_t<alpaka::isQueue<TQueue>>>
inline CachingAllocator<alpaka_common::DevHost, TQueue>& getHostCachingAllocator() {
// thread safe initialisation of the host allocator
CMS_THREAD_SAFE static CachingAllocator<alpaka_common::DevHost, TQueue> allocator(
host(),
config::binGrowth,
config::minBin,
config::maxBin,
config::maxCachedBytes,
config::maxCachedFraction,
false, // reuseSameQueueAllocations
false); // debug
// the public interface is thread safe
return allocator;
}
} // namespace cms::alpakatools The |
Looking for those instances in the shared library shows something unexpected: gcc-nm -A -C -l lib/el8_amd64_gcc11/*.so | grep 'cms::alpakatools::getHostCachingAllocator<alpaka::uniform_cuda_hip::detail::QueueUnifor
mCudaHipRt<alpaka::ApiCudaRt, false>, void>()::allocator'
lib/el8_amd64_gcc11/libHeterogeneousCoreAlpakaServicesCudaAsync.so:00000000000175a0 B guard variable for cms::alpakatools::getHostCachingAllocator<alpaka::uniform_cuda_hip::detail::QueueUniformCudaHipRt<alpaka::ApiCudaRt, false>, void>()::allocator
lib/el8_amd64_gcc11/libHeterogeneousCoreAlpakaServicesCudaAsync.so:00000000000174c0 B cms::alpakatools::getHostCachingAllocator<alpaka::uniform_cuda_hip::detail::QueueUniformCudaHipRt<alpaka::ApiCudaRt, false>, void>()::allocator
lib/el8_amd64_gcc11/pluginHeterogeneousCoreAlpakaTestPluginsPortableCudaAsync.so:0000000000ccbb20 b guard variable for cms::alpakatools::getHostCachingAllocator<alpaka::uniform_cuda_hip::detail::QueueUniformCudaHipRt<alpaka::ApiCudaRt, false>, void>()::allocator
lib/el8_amd64_gcc11/pluginHeterogeneousCoreAlpakaTestPluginsPortableCudaAsync.so:0000000000d01260 B guard variable for cms::alpakatools::getHostCachingAllocator<alpaka::uniform_cuda_hip::detail::QueueUniformCudaHipRt<alpaka::ApiCudaRt, false>, void>()::allocator
lib/el8_amd64_gcc11/pluginHeterogeneousCoreAlpakaTestPluginsPortableCudaAsync.so:0000000000ccbb40 b cms::alpakatools::getHostCachingAllocator<alpaka::uniform_cuda_hip::detail::QueueUniformCudaHipRt<alpaka::ApiCudaRt, false>, void>()::allocator
lib/el8_amd64_gcc11/pluginHeterogeneousCoreAlpakaTestPluginsPortableCudaAsync.so:0000000000d01180 B cms::alpakatools::getHostCachingAllocator<alpaka::uniform_cuda_hip::detail::QueueUniformCudaHipRt<alpaka::ApiCudaRt, false>, void>()::allocator
|
Looking inside the individual find tmp/ -name '*.o' | xargs gcc-nm -A -C -l | grep 'cms::alpakatools::getHostCachingAllocator<alpaka::uniform_cuda_hip::detail::QueueUniformCudaHipRt<alpaka::ApiCudaRt, false>, void>()::allocator' | sed -e's#cms::alpakatools::getHostCachingAllocator<alpaka::uniform_cuda_hip::detail::QueueUniformCudaHipRt<alpaka::ApiCudaRt, false>, void>()#...#'
tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaTest/plugins/HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync/alpaka/TestHelperClass.cc.o:00000000 W guard variable for ...::allocator
tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaTest/plugins/HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync/alpaka/TestHelperClass.cc.o:00000000 W ...::allocator
tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaTest/plugins/HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync/alpaka/TestAlpakaGlobalProducer.cc.o:00000000 W guard variable for ...::allocator
tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaTest/plugins/HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync/alpaka/TestAlpakaGlobalProducer.cc.o:00000000 W ...::allocator
tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaTest/plugins/HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync/alpaka/TestAlpakaProducer.cc.o:00000000 W guard variable for ...::allocator
tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaTest/plugins/HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync/alpaka/TestAlpakaProducer.cc.o:00000000 W ...::allocator
tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaTest/plugins/HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync/alpaka/TestAlpakaGlobalProducerOffset.cc.o:00000000 W guard variable for ...::allocator
tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaTest/plugins/HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync/alpaka/TestAlpakaGlobalProducerOffset.cc.o:00000000 W ...::allocator
tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaTest/plugins/HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync/alpaka/TestAlgo.dev.cc.o:0000000000000000 b guard variable for ...::allocator
tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaTest/plugins/HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync/alpaka/TestAlgo.dev.cc.o:0000000000000020 b ...::allocator
tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaTest/plugins/HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync/alpaka/TestAlpakaStreamProducer.cc.o:00000000 W guard variable for ...::allocator
tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaTest/plugins/HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync/alpaka/TestAlpakaStreamProducer.cc.o:00000000 W ...::allocator
tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaServices/src/alpaka/HeterogeneousCoreAlpakaServicesCudaAsync/AlpakaService.cc.o:00000000 W guard variable for ...::allocator
tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaServices/src/alpaka/HeterogeneousCoreAlpakaServicesCudaAsync/AlpakaService.cc.o:00000000 W ...::allocator All |
The $ /data/cmssw/el8_amd64_gcc11/external/gcc/11.4.1-30ebdc301ebd200f2ae0e3d880258e65/bin/c++ -c -DGNU_GCC -D_GNU_SOURCE -DEIGEN_DONT_PARALLELIZE -DTBB_USE_GLIBCXX_VERSION=110401 -DTBB_SUPPRESS_DEPRECATED_MESSAGES -DTBB_PREVIEW_RESUMABLE_TASKS=1 -DTBB_PREVIEW_TASK_GROUP_EXTENSIONS=1 -DBOOST_SPIRIT_THREADSAFE -DPHOENIX_THREADSAFE -DBOOST_MATH_DISABLE_STD_FPCLASSIFY -DBOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX -DCMSSW_GIT_HASH='CMSSW_13_2_0_pre3' -DPROJECT_NAME='CMSSW' -DPROJECT_VERSION='CMSSW_13_2_0_pre3' -I/data/user/fwyzard/repro/CMSSW_13_2_0_pre3/src -I/data/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_13_2_0_pre3/src -I/data/cmssw/el8_amd64_gcc11/external/alpaka/develop-20230621-9e2225ac6c979464a40749ef9d1e0331/include -I/data/cmssw/el8_amd64_gcc11/external/pcre/8.43-bd2b09f5d686f0f36e748ce001d315ad/include -isystem/data/cmssw/el8_amd64_gcc11/external/boost/1.80.0-5305613b2f750cf1a05dcadf0d672647/include -I/data/cmssw/el8_amd64_gcc11/external/bz2lib/1.0.6-24b287d9981341b8441eb85733326b1a/include -I/data/cmssw/el8_amd64_gcc11/external/cuda/11.8.0-9f0af0f4206be7b705fe550319c49a11/include -I/data/cmssw/el8_amd64_gcc11/external/libuuid/2.34-f7577986509a353c203144983884d697/include -isystem/data/cmssw/el8_amd64_gcc11/lcg/root/6.26.11-50eed3272fcfa103ebe9cf3182b98eb9/include -isystem/data/cmssw/el8_amd64_gcc11/external/tbb/v2021.8.0-7e31093a7b4a477d01bc3946dd0bf612/include -I/data/cmssw/el8_amd64_gcc11/external/xz/5.2.5-56c8544f64e9d56c1108fbe00c3ecb67/include -I/data/cmssw/el8_amd64_gcc11/external/zlib/1.2.11-a365170a889b785ec23815da2b99d7d1/include -I/data/cmssw/el8_amd64_gcc11/external/eigen/82dd3710dac619448f50331c1d6a35da673f764a-f9c27fce684e89466e2ef07869cd264d/include/eigen3 -I/data/cmssw/el8_amd64_gcc11/external/fmt/8.0.1-89199f97a8c166a965017c69137de0d0/include -I/data/cmssw/el8_amd64_gcc11/external/md5/1.0.0-6bede1cf43db82355b3835c81f384d05/include -I/data/cmssw/el8_amd64_gcc11/external/tinyxml2/6.2.0-f05bc085db13b8b4b752c87703ff413d/include -O2 -pthread -pipe -Werror=main -Werror=pointer-arith -Werror=overlength-strings -Wno-vla -Werror=overflow -std=c++17 -ftree-vectorize -Werror=array-bounds -Werror=format-contains-nul -Werror=type-limits -fvisibility-inlines-hidden -fno-math-errno --param vect-max-version-for-alias-checks=50 -Xassembler --compress-debug-sections -fuse-ld=bfd -msse3 -felide-constructors -fmessage-length=0 -Wall -Wno-non-template-friend -Wno-long-long -Wreturn-type -Wextra -Wpessimizing-move -Wclass-memaccess -Wno-cast-function-type -Wno-unused-but-set-parameter -Wno-ignored-qualifiers -Wno-deprecated-copy -Wno-unused-parameter -Wunused -Wparentheses -Wno-deprecated -Werror=return-type -Werror=missing-braces -Werror=unused-value -Werror=unused-label -Werror=address -Werror=format -Werror=sign-compare -Werror=write-strings -Werror=delete-non-virtual-dtor -Werror=strict-aliasing -Werror=narrowing -Werror=unused-but-set-variable -Werror=reorder -Werror=unused-variable -Werror=conversion-null -Werror=return-local-addr -Wnon-virtual-dtor -Werror=switch -fdiagnostics-show-option -Wno-unused-local-typedefs -Wno-attributes -Wno-psabi -Wno-error=unused-variable -DALPAKA_DEFAULT_HOST_MEMORY_ALIGNMENT=128 -DALPAKA_ACC_GPU_CUDA_ENABLED -DALPAKA_HOST_ONLY -DBOOST_DISABLE_ASSERTS -flto -fipa-icf -flto-odr-type-merging -fno-fat-lto-objects -Wodr -fPIC -MMD -MF tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaTest/plugins/HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync/alpaka/TestAlpakaProducer.cc.d /data/user/fwyzard/repro/CMSSW_13_2_0_pre3/src/HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlpakaProducer.cc -o tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaTest/plugins/HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync/alpaka/TestAlpakaProducer.cc.o
$ gcc-nm -C tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaTest/plugins/HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync/alpaka/TestAlpakaProducer.cc.o | grep 'cms::alpakatools::getHostCachingAllocator<alpaka::uniform_cuda_hip::detail::QueueUniformCudaHipRt<alpaka::ApiCudaRt, false>, void>()::allocator'
00000000 W guard variable for cms::alpakatools::getHostCachingAllocator<alpaka::uniform_cuda_hip::detail::QueueUniformCudaHipRt<alpaka::ApiCudaRt, false>, void>()::allocator
00000000 W cms::alpakatools::getHostCachingAllocator<alpaka::uniform_cuda_hip::detail::QueueUniformCudaHipRt<alpaka::ApiCudaRt, false>, void>()::allocator The same $ /data/cmssw/el8_amd64_gcc11/external/gcc/11.4.1-30ebdc301ebd200f2ae0e3d880258e65/bin/c++ -c -DGNU_GCC -D_GNU_SOURCE -DEIGEN_DONT_PARALLELIZE -DTBB_USE_GLIBCXX_VERSION=110401 -DTBB_SUPPRESS_DEPRECATED_MESSAGES -DTBB_PREVIEW_RESUMABLE_TASKS=1 -DTBB_PREVIEW_TASK_GROUP_EXTENSIONS=1 -DBOOST_SPIRIT_THREADSAFE -DPHOENIX_THREADSAFE -DBOOST_MATH_DISABLE_STD_FPCLASSIFY -DBOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX -DCMSSW_GIT_HASH='CMSSW_13_2_0_pre3' -DPROJECT_NAME='CMSSW' -DPROJECT_VERSION='CMSSW_13_2_0_pre3' -I/data/user/fwyzard/repro/CMSSW_13_2_0_pre3/src -I/data/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_13_2_0_pre3/src -I/data/cmssw/el8_amd64_gcc11/external/alpaka/develop-20230621-9e2225ac6c979464a40749ef9d1e0331/include -I/data/cmssw/el8_amd64_gcc11/external/pcre/8.43-bd2b09f5d686f0f36e748ce001d315ad/include -isystem/data/cmssw/el8_amd64_gcc11/external/boost/1.80.0-5305613b2f750cf1a05dcadf0d672647/include -I/data/cmssw/el8_amd64_gcc11/external/bz2lib/1.0.6-24b287d9981341b8441eb85733326b1a/include -I/data/cmssw/el8_amd64_gcc11/external/cuda/11.8.0-9f0af0f4206be7b705fe550319c49a11/include -I/data/cmssw/el8_amd64_gcc11/external/libuuid/2.34-f7577986509a353c203144983884d697/include -isystem/data/cmssw/el8_amd64_gcc11/lcg/root/6.26.11-50eed3272fcfa103ebe9cf3182b98eb9/include -isystem/data/cmssw/el8_amd64_gcc11/external/tbb/v2021.8.0-7e31093a7b4a477d01bc3946dd0bf612/include -I/data/cmssw/el8_amd64_gcc11/external/xz/5.2.5-56c8544f64e9d56c1108fbe00c3ecb67/include -I/data/cmssw/el8_amd64_gcc11/external/zlib/1.2.11-a365170a889b785ec23815da2b99d7d1/include -I/data/cmssw/el8_amd64_gcc11/external/eigen/82dd3710dac619448f50331c1d6a35da673f764a-f9c27fce684e89466e2ef07869cd264d/include/eigen3 -I/data/cmssw/el8_amd64_gcc11/external/fmt/8.0.1-89199f97a8c166a965017c69137de0d0/include -I/data/cmssw/el8_amd64_gcc11/external/md5/1.0.0-6bede1cf43db82355b3835c81f384d05/include -I/data/cmssw/el8_amd64_gcc11/external/tinyxml2/6.2.0-f05bc085db13b8b4b752c87703ff413d/include -O2 -pthread -pipe -Werror=main -Werror=pointer-arith -Werror=overlength-strings -Wno-vla -Werror=overflow -std=c++17 -ftree-vectorize -Werror=array-bounds -Werror=format-contains-nul -Werror=type-limits -fvisibility-inlines-hidden -fno-math-errno --param vect-max-version-for-alias-checks=50 -Xassembler --compress-debug-sections -fuse-ld=bfd -msse3 -felide-constructors -fmessage-length=0 -Wall -Wno-non-template-friend -Wno-long-long -Wreturn-type -Wextra -Wpessimizing-move -Wclass-memaccess -Wno-cast-function-type -Wno-unused-but-set-parameter -Wno-ignored-qualifiers -Wno-deprecated-copy -Wno-unused-parameter -Wunused -Wparentheses -Wno-deprecated -Werror=return-type -Werror=missing-braces -Werror=unused-value -Werror=unused-label -Werror=address -Werror=format -Werror=sign-compare -Werror=write-strings -Werror=delete-non-virtual-dtor -Werror=strict-aliasing -Werror=narrowing -Werror=unused-but-set-variable -Werror=reorder -Werror=unused-variable -Werror=conversion-null -Werror=return-local-addr -Wnon-virtual-dtor -Werror=switch -fdiagnostics-show-option -Wno-unused-local-typedefs -Wno-attributes -Wno-psabi -Wno-error=unused-variable -DALPAKA_DEFAULT_HOST_MEMORY_ALIGNMENT=128 -DALPAKA_ACC_GPU_CUDA_ENABLED -DALPAKA_HOST_ONLY -DBOOST_DISABLE_ASSERTS -fPIC -MMD -MF tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaTest/plugins/HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync/alpaka/TestAlpakaProducer.cc.d /data/user/fwyzard/repro/CMSSW_13_2_0_pre3/src/HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlpakaProducer.cc -o tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaTest/plugins/HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync/alpaka/TestAlpakaProducer.cc.o
$ gcc-nm -C tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaTest/plugins/HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync/alpaka/TestAlpakaProducer.cc.o | grep 'cms::alpakatools::getHostCachingAllocator<alpaka::uniform_cuda_hip::detail::QueueUniformCudaHipRt<alpaka::ApiCudaRt, false>, void>()::allocator'
0000000000000000 u guard variable for cms::alpakatools::getHostCachingAllocator<alpaka::uniform_cuda_hip::detail::QueueUniformCudaHipRt<alpaka::ApiCudaRt, false>, void>()::allocator
0000000000000000 u cms::alpakatools::getHostCachingAllocator<alpaka::uniform_cuda_hip::detail::QueueUniformCudaHipRt<alpaka::ApiCudaRt, false>, void>()::allocator |
The $ /data/cmssw/el8_amd64_gcc11/external/cuda/11.8.0-9f0af0f4206be7b705fe550319c49a11/bin/nvcc -x cu -MMD -MF tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaTest/plugins/HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync/alpaka/TestAlgo.dev.cc.d -dc -DGNU_GCC -D_GNU_SOURCE -DEIGEN_DONT_PARALLELIZE -DTBB_USE_GLIBCXX_VERSION=110401 -DTBB_SUPPRESS_DEPRECATED_MESSAGES -DTBB_PREVIEW_RESUMABLE_TASKS=1 -DTBB_PREVIEW_TASK_GROUP_EXTENSIONS=1 -DBOOST_SPIRIT_THREADSAFE -DPHOENIX_THREADSAFE -DBOOST_MATH_DISABLE_STD_FPCLASSIFY -DBOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX -DCMSSW_GIT_HASH='CMSSW_13_2_0_pre3' -DPROJECT_NAME='CMSSW' -DPROJECT_VERSION='CMSSW_13_2_0_pre3' -I/data/user/fwyzard/repro/CMSSW_13_2_0_pre3/src -I/data/cmssw/el8_amd64_gcc11/cms/cmssw/CMSSW_13_2_0_pre3/src -I/data/cmssw/el8_amd64_gcc11/external/alpaka/develop-20230621-9e2225ac6c979464a40749ef9d1e0331/include -I/data/cmssw/el8_amd64_gcc11/external/pcre/8.43-bd2b09f5d686f0f36e748ce001d315ad/include -I/data/cmssw/el8_amd64_gcc11/external/boost/1.80.0-5305613b2f750cf1a05dcadf0d672647/include -I/data/cmssw/el8_amd64_gcc11/external/bz2lib/1.0.6-24b287d9981341b8441eb85733326b1a/include -I/data/cmssw/el8_amd64_gcc11/external/cuda/11.8.0-9f0af0f4206be7b705fe550319c49a11/include -I/data/cmssw/el8_amd64_gcc11/external/libuuid/2.34-f7577986509a353c203144983884d697/include -I/data/cmssw/el8_amd64_gcc11/lcg/root/6.26.11-50eed3272fcfa103ebe9cf3182b98eb9/include -I/data/cmssw/el8_amd64_gcc11/external/tbb/v2021.8.0-7e31093a7b4a477d01bc3946dd0bf612/include -I/data/cmssw/el8_amd64_gcc11/external/xz/5.2.5-56c8544f64e9d56c1108fbe00c3ecb67/include -I/data/cmssw/el8_amd64_gcc11/external/zlib/1.2.11-a365170a889b785ec23815da2b99d7d1/include -I/data/cmssw/el8_amd64_gcc11/external/eigen/82dd3710dac619448f50331c1d6a35da673f764a-f9c27fce684e89466e2ef07869cd264d/include/eigen3 -I/data/cmssw/el8_amd64_gcc11/external/fmt/8.0.1-89199f97a8c166a965017c69137de0d0/include -I/data/cmssw/el8_amd64_gcc11/external/md5/1.0.0-6bede1cf43db82355b3835c81f384d05/include -I/data/cmssw/el8_amd64_gcc11/external/tinyxml2/6.2.0-f05bc085db13b8b4b752c87703ff413d/include --diag-suppress 20014 -std=c++17 -O3 --generate-line-info --source-in-ptx --display-error-number --expt-relaxed-constexpr --extended-lambda -gencode arch=compute_60,code=[sm_60,compute_60] -gencode arch=compute_70,code=[sm_70,compute_70] -gencode arch=compute_75,code=[sm_75,compute_75] -Wno-deprecated-gpu-targets -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored --cudart shared -DALPAKA_DEFAULT_HOST_MEMORY_ALIGNMENT=128 -DALPAKA_ACC_GPU_CUDA_ENABLED -UALPAKA_HOST_ONLY --compiler-options '-O2 -pthread -pipe -Werror=main -Werror=pointer-arith -Werror=overlength-strings -Wno-vla -Werror=overflow -ftree-vectorize -Werror=array-bounds -Werror=format-contains-nul -Werror=type-limits -fvisibility-inlines-hidden -fno-math-errno --param vect-max-version-for-alias-checks=50 -Xassembler --compress-debug-sections -fuse-ld=bfd -msse3 -felide-constructors -fmessage-length=0 -Wall -Wno-non-template-friend -Wno-long-long -Wreturn-type -Wextra -Wpessimizing-move -Wclass-memaccess -Wno-cast-function-type -Wno-unused-but-set-parameter -Wno-ignored-qualifiers -Wno-deprecated-copy -Wno-unused-parameter -Wunused -Wparentheses -Wno-deprecated -Werror=return-type -Werror=missing-braces -Werror=unused-value -Werror=unused-label -Werror=address -Werror=format -Werror=sign-compare -Werror=write-strings -Werror=delete-non-virtual-dtor -Werror=strict-aliasing -Werror=narrowing -Werror=unused-but-set-variable -Werror=reorder -Werror=unused-variable -Werror=conversion-null -Werror=return-local-addr -Wnon-virtual-dtor -Werror=switch -fdiagnostics-show-option -Wno-unused-local-typedefs -Wno-attributes -Wno-psabi -Wno-error=unused-variable -DALPAKA_DEFAULT_HOST_MEMORY_ALIGNMENT=128 -DALPAKA_ACC_GPU_CUDA_ENABLED -DALPAKA_HOST_ONLY -DBOOST_DISABLE_ASSERTS -std=c++17 -fPIC ' /data/user/fwyzard/repro/CMSSW_13_2_0_pre3/src/HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlgo.dev.cc -o tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaTest/plugins/HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync/alpaka/TestAlgo.dev.cc.o
$ gcc-nm -C tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaTest/plugins/HeterogeneousCoreAlpakaTestPluginsPortableCudaAsync/alpaka/TestAlgo.dev.cc.o | grep 'cms::alpakatools::getHostCachingAllocator<alpaka::uniform_cuda_hip::detail::QueueUniformCudaHipRt<alpaka::ApiCudaRt, false>, void>()::allocator'
0000000000000000 b guard variable for cms::alpakatools::getHostCachingAllocator<alpaka::uniform_cuda_hip::detail::QueueUniformCudaHipRt<alpaka::ApiCudaRt, false>, void>()::allocator
0000000000000020 b cms::alpakatools::getHostCachingAllocator<alpaka::uniform_cuda_hip::detail::QueueUniformCudaHipRt<alpaka::ApiCudaRt, false>, void>()::allocator Enabling host-side LTO produces non-working CUDA programs, but for the sake of argument can be tested here:
This produces no symbols at all for the |
Trying to write a much simpler reproducer:
|
I think this is a minimal reproducer for the underlying problem.
|
Looking at the intermediate files produced by NVCC and compiled by GCC, the code from |
Looks like this could be a workaround for the issue: diff --git a/HeterogeneousCore/AlpakaInterface/interface/CachingAllocator.h b/HeterogeneousCore/AlpakaInterface/interface/CachingAllocator.h
index dfda1ee3d7e2..1a9a7d8fe070 100644
--- a/HeterogeneousCore/AlpakaInterface/interface/CachingAllocator.h
+++ b/HeterogeneousCore/AlpakaInterface/interface/CachingAllocator.h
@@ -83,9 +83,11 @@ namespace cms::alpakatools {
*/
template <typename TDev,
- typename TQueue,
- typename = std::enable_if_t<alpaka::isDevice<TDev> and alpaka::isQueue<TQueue>>>
+ typename TQueue>
class CachingAllocator {
+ static_assert(alpaka::isDevice<TDev>, "");
+ static_assert(alpaka::isQueue<TQueue>, "");
+
public:
#ifdef ALPAKA_ACC_GPU_CUDA_ENABLED
friend class alpaka_cuda_async::AlpakaService; |
I've submitted a bug report to NVIDIA: https://developer.nvidia.com/nvidia_bug/4216808 . |
+heterogeneous This issue was fixed by the PRs linking this issue listed above. |
@cmsbuild, please close |
This issue is fully signed and ready to be closed. |
Calling
cms::alpakatools::make_host_buffer<T>(queue)
in a.dev.cc
file compiled for the CUDA back-end causes a crash at the end of the job:A simple reproducer is
The text was updated successfully, but these errors were encountered: