Apparent incompatibility with current ATLAS GPU environment #1462

pcanal · 2024-10-22T15:06:42Z

When running on lxplus-gpu we observed a crash within one of Celeritas test checking Thrust functionality.

With Celeritas using for example commit fe52da7 or commit aab88dd
and

export G4PATH=/cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/Geant4
export ATLAS_LOCAL_ROOT_BASE="/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase"
source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh

asetup AthSimulation,local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt,2023-10-15T1800

cmake -DVDT_ROOT=/cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/sw/lcg/releases/LCG_106_ATLAS_13/vdt/0.4.4/x86_64-el9-gcc13-opt/ -DCELERITAS_USE_Geant4=OFF -DCELERITAS_BUILD_TESTS=ON -DCELERITAS_USE_CUDA=ON -DCELERITAS_USE_ROOT=ON -DCMAKE_CUDA_ARCHITECTURES=75 -G Ninja -DCMAKE_CUDA_FLAGS="-g -G" ../celeritas

we get a failure in test/celeritas/track_TrackSort

status: Celeritas core state initialization complete

CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x7fff6b4f9cd0 (agent_radix_sort_downsweep.cuh:293)

Thread 1 "track_TrackSort" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 4, grid 2307, block (0,0,0), thread (32,0,0), device 0, sm 0, warp 0, lane 0]
0x00007fff6b4f9a50 in cub::CUB_200302_750_NS::AgentRadixSortDownsweep<cub::CUB_200302_750_NS::AgentRadixSortDownsweepPolicy<512, 23, unsigned int, (cub::CUB_200302_750_NS::BlockLoadAlgorithm)3, (cub::CUB_200302_750_NS::CacheLoadModifier)0, (cub::CUB_200302_750_NS::RadixRankAlgorithm)2, (cub::CUB_200302_750_NS::BlockScanAlgorithm)2, 7, cub::CUB_200302_750_NS::RegBoundScaling<512, 23, unsigned int> >, false, unsigned int, unsigned int, unsigned int, cub::CUB_200302_750_NS::detail::identity_decomposer_t>::ScatterKeys<false> (this=0x7fffd7fffb58, twiddled_keys=..., relative_bin_offsets=..., ranks=..., valid_items=6240)
    at /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/cub/agent/agent_radix_sort_downsweep.cuh:293
293	            temp_storage.keys_and_offsets.exchange_keys[ranks[ITEM]] = twiddled_keys[ITEM];
(cuda-gdb) p ranks
$2 = (@local int (& @local)[23]) <error reading variable: Error: read_local_memory(0, 0, 0, 0): failed to read local memory at address 0x7fffd7fff8f8 size 92, error=CUDBG_ERROR_INVALID_ADDRESS, error message=
>
(cuda-gdb) p twiddled_keys
$3 = (@local _ZN3cub17CUB_200302_750_NS23AgentRadixSortDownsweepINS0_29AgentRadixSortDownsweepPolicyILi512ELi23EjLNS0_18BlockLoadAlgorithmE3ELNS0_17CacheLoadModifierE0ELNS0_18RadixRankAlgorithmE2ELNS0_18BlockScanAlgorithmE2ELi7ENS0_15RegBoundScalingILi512ELi23EjEEEELb0EjjjNS0_6detail21identity_decomposer_tEE16bit_ordered_typeE (& @local)[23]) <error reading variable: Error: read_local_memory(0, 0, 0, 0): failed to read local memory at address 0x7fffd7fffaa4 size 92, error=CUDBG_ERROR_INVALID_ADDRESS, error message=

compute-sanitize reports

========= Invalid __global__ write of size 4 bytes
=========     at ScatterKeys<false>+0x323d0 in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/cub/agent/agent_radix_sort_downsweep.cuh:293
=========     by thread (32,0,0) in block (0,0,0)
=========     Address 0x7f2a612882f8 is out of bounds
=========     and is 2391310585 bytes after the nearest allocation at 0x7f29d2a00000 of size 512 bytes
=========     Device Frame:ProcessTile<false>+0x33710 in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/cub/agent/agent_radix_sort_downsweep.cuh:635
=========     Device Frame:ProcessRegion+0x363d0 in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/cub/agent/agent_radix_sort_downsweep.cuh:820
=========     Device Frame:void cub::CUB_200302_750_NS::DeviceRadixSortDownsweepKernel<cub::CUB_200302_750_NS::DeviceRadixSortPolicy<unsigned int, unsigned int, unsigned int>::Policy900, (bool)0, (bool)0, unsigned int, unsign
ed int, unsigned int, cub::CUB_200302_750_NS::detail::identity_decomposer_t>(const T4 *, T4 *, const T5 *, T5 *, T6 *, T6, int, int, cub::CUB_200302_750_NS::GridEvenShare<T6>, T7)+0x36420 in /cvmfs/projects.cern.ch/lcg/releas
es/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/cub/agent/agent_radix_sort_downsweep.cuh:339
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x2f167f]
=========                in /lib64/libcuda.so.1
=========     Host Frame: [0x15803]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/sw/lcg/releases/cuda/12.4.1-5bf34/x86_64-el9-gcc13-opt/lib64/libcudart.so.12
=========     Host Frame:cudaLaunchKernel [0x75230]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/sw/lcg/releases/cuda/12.4.1-5bf34/x86_64-el9-gcc13-opt/lib64/libcudart.so.12
=========     Host Frame:cudaError cudaLaunchKernel<char>(char const*, dim3, dim3, void**, unsigned long, CUstream_st*) in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/cuda_runtim
e.h:216 [0xd772fa]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:__device_stub__ZN3cub17CUB_200302_750_NS30DeviceRadixSortDownsweepKernelINS0_21DeviceRadixSortPolicyIjjjE9Policy900ELb0ELb0EjjjNS0_6detail21identity_decomposer_tEEEvPKT2_PS7_PKT3_PSB_PT4_SF_iiNS0_13Gr
idEvenShareISF_EET5_(unsigned int const*, unsigned int*, unsigned int const*, unsigned int*, unsigned int*, unsigned int, int, int, cub::CUB_200302_750_NS::GridEvenShare<unsigned int>&, cub::CUB_200302_750_NS::detail::identit
y_decomposer_t&) in /tmp/pcanal/tmpxft_00233a7e_00000000-6_TrackSortUtils.cudafe1.stub.c:55 [0xd75a4d]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:void cub::CUB_200302_750_NS::__wrapper__device_stub_DeviceRadixSortDownsweepKernel<cub::CUB_200302_750_NS::DeviceRadixSortPolicy<unsigned int, unsigned int, unsigned int>::Policy900, false, false, uns
igned int, unsigned int, unsigned int, cub::CUB_200302_750_NS::detail::identity_decomposer_t>(unsigned int const*&, unsigned int*&, unsigned int const*&, unsigned int*&, unsigned int*&, unsigned int&, int&, int&, cub::CUB_200
302_750_NS::GridEvenShare<unsigned int>&, cub::CUB_200302_750_NS::detail::identity_decomposer_t&) in /tmp/pcanal/tmpxft_00233a7e_00000000-6_TrackSortUtils.cudafe1.stub.c:58 [0xd75abe]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:void cub::CUB_200302_750_NS::DeviceRadixSortDownsweepKernel<cub::CUB_200302_750_NS::DeviceRadixSortPolicy<unsigned int, unsigned int, unsigned int>::Policy900, false, false, unsigned int, unsigned int
, unsigned int, cub::CUB_200302_750_NS::detail::identity_decomposer_t>(unsigned int const*, unsigned int*, unsigned int const*, unsigned int*, unsigned int*, unsigned int, int, int, cub::CUB_200302_750_NS::GridEvenShare<unsigned int>, cub::CUB_200302_750_NS::detail::identity_decomposer_t) in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/cub/device/dispatch/dispatch_radix_sort.cuh:303 [0xd792a8]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:cudaError thrust::THRUST_200302_750_NS::cuda_cub::launcher::triple_chevron::doit_host<void (*)(unsigned int const*, unsigned int*, unsigned int const*, unsigned int*, unsigned int*, unsigned int, int, int, cub::CUB_200302_750_NS::GridEvenShare<unsigned int>, cub::CUB_200302_750_NS::detail::identity_decomposer_t), unsigned int const*, unsigned int*, unsigned int const*, unsigned int*, unsigned int*, unsigned int, int, int, cub::CUB_200302_750_NS::GridEvenShare<unsigned int>, cub::CUB_200302_750_NS::detail::identity_decomposer_t>(void (*)(unsigned int const*, unsigned int*, unsigned int const*, unsigned int*, unsigned int*, unsigned int, int, int, cub::CUB_200302_750_NS::GridEvenShare<unsigned int>, cub::CUB_200302_750_NS::detail::identity_decomposer_t), unsigned int const* const&, unsigned int* const&, unsigned int const* const&, unsigned int* const&, unsigned int* const&, unsigned int const&, int const&, int const&, cub::CUB_200302_750_NS::GridEvenShare<unsigned int> const&, cub::CUB_200302_750_NS::detail::identity_decomposer_t const&) const in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/thrust/system/cuda/detail/core/triple_chevron_launch.h:70 [0xd8df37]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:cudaError cub::CUB_200302_750_NS::DeviceRadixSort::SortPairs<unsigned int, unsigned int, int>(void*, unsigned long&, cub::CUB_200302_750_NS::DoubleBuffer<unsigned int>&, cub::CUB_200302_750_NS::DoubleBuffer<unsigned int>&, int, int, int, CUstream_st*) in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/cub/device/device_radix_sort.cuh:806 [0xd82956]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:cudaError thrust::THRUST_200302_750_NS::cuda_cub::__radix_sort::dispatch<cuda::std::__4::integral_constant<bool, true>, thrust::THRUST_200302_750_NS::less<unsigned int> >::doit<unsigned int, unsigned int, long>(void*, unsigned long&, cub::CUB_200302_750_NS::DoubleBuffer<unsigned int>&, cub::CUB_200302_750_NS::DoubleBuffer<unsigned int>&, long, CUstream_st*) in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/thrust/system/cuda/detail/sort.h:279 [0xd7d01f]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:void thrust::THRUST_200302_750_NS::cuda_cub::__radix_sort::radix_sort<cuda::std::__4::integral_constant<bool, true>, thrust::THRUST_200302_750_NS::detail::execute_with_allocator<thrust::THRUST_200302_750_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200302_750_NS::cuda_cub::execute_on_stream_nosync_base>, unsigned int, unsigned int, long, thrust::THRUST_200302_750_NS::less<unsigned int> >(thrust::THRUST_200302_750_NS::cuda_cub::execution_policy<thrust::THRUST_200302_750_NS::detail::execute_with_allocator<thrust::THRUST_200302_750_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200302_750_NS::cuda_cub::execute_on_stream_nosync_base> >&, unsigned int*, unsigned int*, long, thrust::THRUST_200302_750_NS::less<unsigned int>) in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/thrust/system/cuda/detail/sort.h:367 [0xd7cb08]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:thrust::THRUST_200302_750_NS::cuda_cub::__smart_sort::enable_if_primitive_sort<unsigned int*, thrust::THRUST_200302_750_NS::less<unsigned int> >::type thrust::THRUST_200302_750_NS::cuda_cub::__smart_sort::smart_sort<cuda::std::__4::integral_constant<bool, true>, cuda::std::__4::integral_constant<bool, false>, thrust::THRUST_200302_750_NS::detail::execute_with_allocator<thrust::THRUST_200302_750_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200302_750_NS::cuda_cub::execute_on_stream_nosync_base>, unsigned int*, thrust::THRUST_200302_750_NS::device_ptr<unsigned int>, thrust::THRUST_200302_750_NS::less<unsigned int> >(thrust::THRUST_200302_750_NS::cuda_cub::execution_policy<thrust::THRUST_200302_750_NS::detail::execute_with_allocator<thrust::THRUST_200302_750_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200302_750_NS::cuda_cub::execute_on_stream_nosync_base> >&, unsigned int*, unsigned int*, thrust::THRUST_200302_750_NS::device_ptr<unsigned int>, thrust::THRUST_200302_750_NS::less<unsigned int>) in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/thrust/system/cuda/detail/sort.h:461 [0xd7c674]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:void thrust::THRUST_200302_750_NS::cuda_cub::sort_by_key<thrust::THRUST_200302_750_NS::detail::execute_with_allocator<thrust::THRUST_200302_750_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200302_750_NS::cuda_cub::execute_on_stream_nosync_base>, unsigned int*, thrust::THRUST_200302_750_NS::device_ptr<unsigned int>, thrust::THRUST_200302_750_NS::less<unsigned int> >(thrust::THRUST_200302_750_NS::cuda_cub::execution_policy<thrust::THRUST_200302_750_NS::detail::execute_with_allocator<thrust::THRUST_200302_750_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200302_750_NS::cuda_cub::execute_on_stream_nosync_base> >&, unsigned int*, unsigned int*, thrust::THRUST_200302_750_NS::device_ptr<unsigned int>, thrust::THRUST_200302_750_NS::less<unsigned int>) in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/thrust/system/cuda/detail/sort.h:551 [0xd7c166]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
[1:25](https://celeritasproject.slack.com/archives/DSUBD0FLM/p1729319158624659)
=========     Host Frame:void thrust::THRUST_200302_750_NS::cuda_cub::sort_by_key<thrust::THRUST_200302_750_NS::detail::execute_with_allocator<thrust::THRUST_200302_750_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200302_750_NS::cuda_cub::execute_on_stream_nosync_base>, unsigned int*, thrust::THRUST_200302_750_NS::device_ptr<unsigned int> >(thrust::THRUST_200302_750_NS::cuda_cub::execution_policy<thrust::THRUST_200302_750_NS::detail::execute_with_allocator<thrust::THRUST_200302_750_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200302_750_NS::cuda_cub::execute_on_stream_nosync_base> >&, unsigned int*, unsigned int*, thrust::THRUST_200302_750_NS::device_ptr<unsigned int>) in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/thrust/system/cuda/detail/sort.h:621 [0xd7b15c]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:void thrust::THRUST_200302_750_NS::sort_by_key<thrust::THRUST_200302_750_NS::detail::execute_with_allocator<thrust::THRUST_200302_750_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200302_750_NS::cuda_cub::execute_on_stream_nosync_base>, unsigned int*, thrust::THRUST_200302_750_NS::device_ptr<unsigned int> >(thrust::THRUST_200302_750_NS::detail::execution_policy_base<thrust::THRUST_200302_750_NS::detail::execute_with_allocator<thrust::THRUST_200302_750_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200302_750_NS::cuda_cub::execute_on_stream_nosync_base> > const&, unsigned int*, unsigned int*, thrust::THRUST_200302_750_NS::device_ptr<unsigned int>) in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/thrust/detail/sort.inl:102 [0xd79535]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:void celeritas::detail::(anonymous namespace)::sort_impl<celeritas::OpaqueId<celeritas::ActionInterface, unsigned int>, unsigned int>(celeritas::Collection<unsigned int, (celeritas::Ownership)1, (celeritas::MemSpace)1, celeritas::OpaqueId<celeritas::Thread_, unsigned int> > const&, celeritas::ObserverPtr<celeritas::OpaqueId<celeritas::ActionInterface, unsigned int> const, (celeritas::MemSpace)1>, celeritas::OpaqueId<celer
itas::Stream_, unsigned int>) in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/celeritas/src/celeritas/track/detail/TrackSortUtils.cu:98 [0xd78310]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:celeritas::detail::sort_tracks(celeritas::CoreStateData<(celeritas::Ownership)1, (celeritas::MemSpace)1> const&, celeritas::TrackOrder) in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/celeritas/src/celeritas/track/detail/TrackSortUtils.cu:155 [0xd738e1]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:celeritas::SortTracksAction::step(celeritas::CoreParams const&, celeritas::CoreState<(celeritas::MemSpace)1>&) const in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/celeritas/src/celeritas/track/SortTracksAction.cc:121 [0xc2aa05]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:void celeritas::ActionSequence::step<(celeritas::MemSpace)1>(celeritas::CoreParams const&, celeritas::CoreState<(celeritas::MemSpace)1>&) in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/celeritas/src/celeritas/global/ActionSequence.cc:130 [0xae7110]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:celeritas::Stepper<(celeritas::MemSpace)1>::operator()() in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/celeritas/src/celeritas/global/Stepper.cc:130 [0xb21ef9]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:celeritas::Stepper<(celeritas::MemSpace)1>::operator()(celeritas::Span<celeritas::Primary const, 18446744073709551615ul>) in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/celeritas/src/celeritas/global/Stepper.cc:166 [0xb22272]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:celeritas::test::TestTrackSortActionIdEm3Stepper_device_is_sorted_Test::TestBody() in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/celeritas/test/celeritas/track/TrackSort.test.cc:394 [0x43529]

The text was updated successfully, but these errors were encountered:

pcanal · 2024-10-22T15:07:54Z

So the problem is weird and “seem” to indicate violation of ODR (i.e. an inconsistent build). On the debugger when we get to the point the ‘bad’ function is about to be called:

    at /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/cub/agent/agent_radix_sort_downsweep.cuh:635
635             ScatterKeys<FULL_TILE>(keys, relative_bin_offsets, ranks, valid_items);

(and when we use up to get back to that frame), all the variable have correct values.

when we step into the ‘bad’ function and immediately get to:

    at /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/cub/agent/agent_radix_sort_downsweep.cuh:291
291             for (int ITEM = 0; ITEM < ITEMS_PER_THREAD; ++ITEM)

then everything is screwed

[1:49](https://celeritasproject.slack.com/archives/DSUBD0FLM/p1729536561770539)
(cuda-gdb) p &keys
$46 = (@local _ZN3cub17CUB_200302_750_NS23AgentRadixSortDownsweepINS0_29AgentRadixSortDownsweepPolicyILi256ELi47EjLNS0_18BlockLoadAlgorithmE3ELNS0_17CacheLoadModifierE0ELNS0_18RadixRankAlgorithmE1ELNS0_18BlockScanAlgorithmE2ELi6ENS0_15RegBoundScalingILi256ELi47EjEEEELb0EjjjNS0_6detail21identity_decomposer_tEE16bit_ordered_typeE (*)[47]) 0xfffa5c
(cuda-gdb) p keys
$47 = {18, 18, 18, 18, 18, 18, 18, 18, 4294967295 <repeats 39 times>}
(cuda-gdb) down
    at /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/cub/agent/agent_radix_sort_downsweep.cuh:291
291             for (int ITEM = 0; ITEM < ITEMS_PER_THREAD; ++ITEM)
(cuda-gdb) p &twiddled_keys
$48 = (@local _ZN3cub17CUB_200302_750_NS23AgentRadixSortDownsweepINS0_29AgentRadixSortDownsweepPolicyILi256ELi47EjLNS0_18BlockLoadAlgorithmE3ELNS0_17CacheLoadModifierE0ELNS0_18RadixRankAlgorithmE1ELNS0_18BlockScanAlgorithmE2ELi6ENS0_15RegBoundScalingILi256ELi47EjEEEELb0EjjjNS0_6detail21identity_decomposer_tEE16bit_ordered_typeE (*)[47]) 0x2c06f02605ac33
(cuda-gdb) p twiddled_keys
$49 = (@local _ZN3cub17CUB_200302_750_NS23AgentRadixSortDownsweepINS0_29AgentRadixSortDownsweepPolicyILi256ELi47EjLNS0_18BlockLoadAlgorithmE3ELNS0_17CacheLoadModifierE0ELNS0_18RadixRankAlgorithmE1ELNS0_18BlockScanAlgorithmE2ELi6ENS0_15RegBoundScalingILi256ELi47EjEEEELb0EjjjNS0_6detail21identity_decomposer_tEE16bit_ordered_typeE (& @local)[47]) <error reading variable: Error: read_local_memory(0, 0, 0, 0): failed to read local memory at address 0x7fffd7fffa5c size 188, error=CUDBG_ERROR_INVALID_ADDRESS, error message=

esseivaju · 2024-10-22T16:59:18Z

I hit the same error on Perlmutter with the build AthSimulation,local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt,2024-10-18T1700

16:52:37 -------- EEEE ------- G4Exception-START -------- EEEE -------
16:52:37 *** G4Exception : celer0004
16:52:37       issued by : Thrust GPU library
16:52:37 parallel_for: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered

pcanal · 2024-10-22T17:11:56Z

I am making a build not using standalone versions of gcc13 and cuda 12.4.1 to get another data point.

esseivaju · 2024-10-22T19:01:28Z

I only get the error running with Athena+Celeritas. I tried building Celeritas within the Athena environment and all the unit tests succeeded.

asetup AthSimulation,local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt,2024-10-18T1700
cmake -DVDT_ROOT=/cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/sw/lcg/releases/LCG_106_ATLAS_13/vdt/0.4.4/x86_64-el9-gcc13-opt/ -DCELERITAS_USE_Geant4=OFF -DCELERITAS_BUILD_TESTS=ON -DCELERITAS_USE_CUDA=ON -DCELERITAS_USE_ROOT=ON -DCMAKE_CUDA_ARCHITECTURES=80 -G Ninja -DCMAKE_CUDA_FLAGS="-g -G" -DCMAKE_EXPORT_COMPILE_COMMANDS=ON ../celeritas

EDIT: TrackSort test is disabled without Geant4

sethrj · 2024-10-22T19:32:16Z

Have we tried a build without VecGeom just to rule out RDC weirdness?

esseivaju · 2024-10-22T21:07:36Z

WIth Geant4, 29 tests fail. compute-sanitizer for the TrackSort test shows multiple errors.

Stack overflow

========= Stack overflow
=========     at celeritas::SimTrackView::operator =(const celeritas::SimTrackInitializer &)+0x50 in /pscratch/sd/e/esseivaj/celer-athena/celeritas/src/celeritas/track/SimTrackView.hh:148
=========     by thread (5,0,0) in block (0,0,0)
=========     Device Frame:celeritas::detail::InitTracksExecutor::operator ()(celeritas::OpaqueId<celeritas::Thread_, unsigned int>) const+0x28c0 in /pscratch/sd/e/esseivaj/celer-athena/celeritas/src/celeritas/track/detail/InitTracksExecutor.hh:111
=========     Device Frame:launch_kernel_impl<InitTracksExecutor>+0x2950 in /pscratch/sd/e/esseivaj/celer-athena/celeritas/src/corecel/sys/detail/KernelLauncherImpl.device.hh:36
=========     Device Frame:void celeritas::detail::<unnamed>::launch_action_impl<celeritas::detail::InitTracksExecutor, (bool)1>(celeritas::Range<celeritas::OpaqueId<celeritas::Thread_, unsigned int>>, T1)+0x2970 in /pscratch/sd/e/esseivaj/celer-athena/celeritas/src/c
orecel/sys/detail/KernelLauncherImpl.device.hh:48
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x33255f]
=========                in /.singularity.d/libs/libcuda.so.1
=========     Host Frame: [0x15803]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/sw/lcg/releases/cuda/12.4.1-5bf34/x86_64-el9-gcc13-opt/lib64/libcudart.so.12
=========     Host Frame:cudaLaunchKernel [0x75230]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/sw/lcg/releases/cuda/12.4.1-5bf34/x86_64-el9-gcc13-opt/lib64/libcudart.so.12
=========     Host Frame:void celeritas::detail::(anonymous namespace)::launch_action_impl<celeritas::detail::InitTracksExecutor, true>(celeritas::Range<celeritas::OpaqueId<celeritas::Thread_, unsigned int> >, celeritas::detail::InitTracksExecutor) in /pscratch/sd/e/esseivaj/celer-athena/celeritas/src/corecel/sys/detail/KernelLauncherImpl.device.hh:47 [0x54e0a2]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:celeritas::InitializeTracksAction::step_impl(celeritas::CoreParams const&, celeritas::CoreState<(celeritas::MemSpace)1>&, unsigned int) const in /pscratch/sd/e/esseivaj/celer-athena/celeritas/src/celeritas/track/InitializeTracksAction.cu:33 [0
x54dff7]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:celeritas::InitializeTracksAction::step(celeritas::CoreParams const&, celeritas::CoreState<(celeritas::MemSpace)1>&) const [0x54cae2]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:void celeritas::ActionSequence::step<(celeritas::MemSpace)1>(celeritas::CoreParams const&, celeritas::CoreState<(celeritas::MemSpace)1>&) [0x3cc071]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:celeritas::Stepper<(celeritas::MemSpace)1>::operator()() [0x3e2ee9]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:celeritas::test::TestTrackPartitionEm3Stepper_device_is_partitioned_Test::TestBody() [0x21ff9]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/test/celeritas/track_TrackSort
=========     Host Frame:void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/g
test.cc:2657 [0x55fbe]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::Test::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2686 [0x4607d]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::TestInfo::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2845 [0x46234]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::TestSuite::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2977 [0x46366]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::internal::UnitTestImpl::RunAllTests() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:5890 [0x4c0eb]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) in /data/bmorgan/athena-buildroot/1810-24-gpu/build/
AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2657 [0x5643e]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::UnitTest::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:5455 [0x464d5]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:celeritas::testdetail::test_main(int, char**) [0x43805]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/test/libtestcel_harness.so
=========     Host Frame:__libc_start_call_main [0x2958f]
=========                in /lib64/libc.so.6
=========     Host Frame:__libc_start_main [0x2963f]
=========                in /lib64/libc.so.6
=========     Host Frame:_start [0x1ebc4]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/test/celeritas/track_TrackSort

launch failure

========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaStreamSynchronize.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame: [0x481c35]
=========                in /.singularity.d/libs/libcuda.so.1
=========     Host Frame:cudaStreamSynchronize [0x74ffa]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/sw/lcg/releases/cuda/12.4.1-5bf34/x86_64-el9-gcc13-opt/lib64/libcudart.so.12
=========     Host Frame:celeritas::StatusChecker::launch_impl(celeritas::CoreParams const&, celeritas::CoreState<(celeritas::MemSpace)1>&, celeritas::StatusCheckStateData<(celeritas::Ownership)1, (celeritas::MemSpace)1> const&) const in /pscratch/sd/e/esseivaj/celer-athena/celeritas/src/celeritas/track/StatusChecker.cu:39 [0x55158d]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:void celeritas::StatusChecker::step<(celeritas::MemSpace)1>(celeritas::OpaqueId<celeritas::ActionInterface, unsigned int>, celeritas::CoreParams const&, celeritas::CoreState<(celeritas::MemSpace)1>&) const [0x5502fd]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:void celeritas::ActionSequence::step<(celeritas::MemSpace)1>(celeritas::CoreParams const&, celeritas::CoreState<(celeritas::MemSpace)1>&) [0x3cc09f]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:celeritas::Stepper<(celeritas::MemSpace)1>::operator()() [0x3e2ee9]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:celeritas::test::TestTrackPartitionEm3Stepper_device_is_partitioned_Test::TestBody() [0x21ff9]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/test/celeritas/track_TrackSort
=========     Host Frame:void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2657 [0x55fbe]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::Test::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2686 [0x4607d]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::TestInfo::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2845 [0x46234]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::TestSuite::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2977 [0x46366]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::internal::UnitTestImpl::RunAllTests() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:5890 [0x4c0eb]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) in /data/bmorgan/athena-buildroot/1810-24-gpu/build/
AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2657 [0x5643e]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::UnitTest::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:5455 [0x464d5]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:celeritas::testdetail::test_main(int, char**) [0x43805]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/test/libtestcel_harness.so
=========     Host Frame:__libc_start_call_main [0x2958f]
=========                in /lib64/libc.so.6
=========     Host Frame:__libc_start_main [0x2963f]
=========                in /lib64/libc.so.6
=========     Host Frame:_start [0x1ebc4]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/test/celeritas/track_TrackSort

launch failure cudaFree

========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaFreeHost.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame: [0x481c35]
=========                in /.singularity.d/libs/libcuda.so.1
=========     Host Frame:cudaFreeHost [0x5780d]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/sw/lcg/releases/cuda/12.4.1-5bf34/x86_64-el9-gcc13-opt/lib64/libcudart.so.12
=========     Host Frame:celeritas::PinnedAllocator<celeritas::OpaqueId<celeritas::Thread_, unsigned int> >::deallocate(celeritas::OpaqueId<celeritas::Thread_, unsigned int>*, unsigned long) [0x3e6922]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:celeritas::CoreState<(celeritas::MemSpace)1>::~CoreState() [0x3d2f82]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:celeritas::Stepper<(celeritas::MemSpace)1>::~Stepper() [0x3e35c1]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:celeritas::test::TestTrackPartitionEm3Stepper_device_is_partitioned_Test::TestBody() [clone .cold] [0x1de88]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/test/celeritas/track_TrackSort
=========     Host Frame:void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2657 [0x55fbe]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::Test::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2686 [0x4607d]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::TestInfo::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2845 [0x46234]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::TestSuite::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2977 [0x46366]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::internal::UnitTestImpl::RunAllTests() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:5890 [0x4c0eb]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) in /data/bmorgan/athena-buildroot/1810-24-gpu/build/
AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2657 [0x5643e]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::UnitTest::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:5455 [0x464d5]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:celeritas::testdetail::test_main(int, char**) [0x43805]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/test/libtestcel_harness.so
=========     Host Frame:__libc_start_call_main [0x2958f]
=========                in /lib64/libc.so.6
=========     Host Frame:__libc_start_main [0x2963f]
=========                in /lib64/libc.so.6
=========     Host Frame:_start [0x1ebc4]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/test/celeritas/track_TrackSort

sethrj · 2024-10-22T21:18:32Z

@esseivaju youre sure you're building with the right cuda arch?

esseivaju · 2024-10-22T21:33:53Z

yes, building with CUDA_ARCH=80. This should also be included in the cvmfs build (@drbenmorgan ). I am using cuda 12.6 to compile Celeritas, the cvmfs build was 12.4 I think but it shouldn't matter since this is an independent build, asetup is only used for picking up Celeritas dependencies.

esseivaju · 2024-10-23T00:07:43Z

I rebuilt AthSimulation + AthSimulationExternals on Perlmutter, but that made no difference. Building Celeritas within the AthSimulationExternal environment leads to the same failures in Athena and the Celeritas unit tests. I also see invalid read as reported by Philippe

invalid memory access

========= Invalid __global__ read of size 4 bytes
=========     at _adaptorINS0_10device_ptrIjEEPjjNS0_8cuda_cub3tagENS0_27random_access_traversal_tagENS0_16device_referenceIjEElE7advanceEl+0x4300 in /usr/local/cuda/include/cub/thread/thread_load.cuh:289
=========     by thread (64,0,0) in block (2,0,0)
=========     Address 0x16e0006904 is out of bounds
=========     and is 6 bytes after the nearest allocation at 0x16e0006400 of size 1279 bytes
=========     Device Frame:_core_access7advanceINS0_10device_ptrIjEEEEvRT_NS5_15difference_typeE+0x4300 in /usr/local/cuda/include/cub/thread/thread_load.cuh:354
=========     Device Frame:thrust20THRUST_200500_800_NS15iterator_facadeINS0_10device_ptrIjEEjNS0_8cuda_cub3tagENS0_27random_access_traversal_tagENS0_16device_referenceIjEElEpLEl+0x4300 in /usr/local/cuda/include/cub/iterator/cache_modified_input_iterator.cuh:217
=========     Device Frame:27random_access_traversal_tagENS0_16device_referenceIjEElEET_RKNS0_15iterator_facadeIS9_T0_T1_T2_T3_T4_EENS9_15difference_typeE+0x42c0 in /usr/local/cuda/include/cub/block/block_load.cuh:86
=========     Device Frame:erator_facadeINS0_10device_ptrIjEEjNS0_8cuda_cub3tagENS0_27random_access_traversal_tagENS0_16device_referenceIjEElEixEl+0x42c0 in /usr/local/cuda/include/cub/block/block_load.cuh:854
=========     Device Frame:_10device_ptrIjEEjEEE12consume_tileILb1EEEvii+0x42c0 in /usr/local/cuda/include/cub/block/block_load.cuh:1484
=========     Device Frame:_+0x42c0 in /usr/local/cuda/include/cub/agent/agent_select_if.cuh:760
=========     Device Frame:EvRNS1_12cross_systemIT_T0_EET1_T2_+0x4290 in /usr/local/cuda/include/cub/agent/agent_select_if.cuh:898
=========     Device Frame:_cub3tagENS0_6system3cpp6detail3tagEPKdEEvPT_PT0_T1_+0x1540 in /usr/local/cuda/include/cub/agent/agent_select_if.cuh:934
=========     Device Frame:void cub::CUB_200500_800_NS::DeviceSelectSweepKernel<cub::CUB_200500_800_NS::detail::device_select_policy_hub<unsigned int, cub::CUB_200500_800_NS::NullType, int, (bool)0, (bool)1>::Policy900, unsigned int *, cub::CUB_200500_800_NS::NullType *, thrust::THRUST_200500_800_NS::device_ptr<unsigned int>, int *, cub::CUB_200500_800_NS::ScanTileState<int, (bool)1>, celeritas::detail::IsNotInactive, cub::CUB_200500_800_NS::NullType, int, (bool)1>(T2, T3, T4, T5, T6, T7, T8, T9, int, cub::CUB_200500_800_NS::detail::vsmem_t)+0x10 in /usr/local/cuda/include/cub/device/dispatch/dispatch_select_if.cuh:230
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x33255f]
=========                in /.singularity.d/libs/libcuda.so.1
=========     Host Frame: [0x15aa7]
=========                in /usr/local/cuda/lib64/libcudart.so.12
=========     Host Frame:cudaLaunchKernel [0x759f0]
=========                in /usr/local/cuda/lib64/libcudart.so.12
=========     Host Frame:void cub::CUB_200500_800_NS::DeviceSelectSweepKernel<cub::CUB_200500_800_NS::detail::device_select_policy_hub<unsigned int, cub::CUB_200500_800_NS::NullType, int, false, true>::Policy900, unsigned int*, cub::CUB_200500_800_NS::NullType*, thrust::THRUST_200500_800_NS::device_ptr<unsigned int>, int*, cub::CUB_200500_800_NS::ScanTileState<int, true>, celeritas::detail::IsNotInactive, cub::CUB_200500_800_NS::NullType, int, true>(unsigned int*, cub::CUB_200500_800_NS::NullType*, thrust::THRUST_200500_800_NS::device_ptr<unsigned int>, int*, cub::CUB_200500_800_NS::ScanTileState<int, true>, celeritas::detail::IsNotInactive, cub::CUB_200500_800_NS::NullType, int, int, cub::CUB_200500_800_NS::detail::vsmem_t) [0x59f53d]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/lib64/libceleritas.so
=========     Host Frame:cudaError thrust::THRUST_200500_800_NS::cuda_cub::launcher::triple_chevron::doit_host<void (*)(unsigned int*, cub::CUB_200500_800_NS::NullType*, thrust::THRUST_200500_800_NS::device_ptr<unsigned int>, int*, cub::CUB_200500_800_NS::ScanTileState<int, true>, celeritas::detail::IsNotInactive, cub::CUB_200500_800_NS::NullType, int, int, cub::CUB_200500_800_NS::detail::vsmem_t), unsigned int*, cub::CUB_200500_800_NS::NullType*, thrust::THRUST_200500_800_NS::device_ptr<unsigned int>, int*, cub::CUB_200500_800_NS::ScanTileState<int, true>, celeritas::detail::IsNotInactive, cub::CUB_200500_800_NS::NullType, int, int, cub::CUB_200500_800_NS::detail::vsmem_t>(void (*)(unsigned int*, cub::CUB_200500_800_NS::NullType*, thrust::THRUST_200500_800_NS::device_ptr<unsigned int>, int*, cub::CUB_200500_800_NS::ScanTileState<int, true>, celeritas::detail::IsNotInactive, cub::CUB_200500_800_NS::NullType, int, int, cub::CUB_200500_800_NS::detail::vsmem_t), unsigned int* const&, cub::CUB_200500_800_NS::NullType* const&, thrust::THRUST_200500_800_NS::device_ptr<unsigned int> const&, int* const&, cub::CUB_200500_800_NS::ScanTileState<int, true> const&, celeritas::detail::IsNotInactive const&, cub::CUB_200500_800_NS::NullType const&, int const&, int const&, cub::CUB_200500_800_NS::detail::vsmem_t const&) const [clone .isra.0] [0x5a0643]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/lib64/libceleritas.so
=========     Host Frame:thrust::THRUST_200500_800_NS::cuda_cub::detail::DispatchPartitionIf<thrust::THRUST_200500_800_NS::detail::execute_with_allocator<thrust::THRUST_200500_800_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200500_800_NS::cuda_cub::execute_on_stream_nosync_base>, unsigned int*, cub::CUB_200500_800_NS::NullType*, thrust::THRUST_200500_800_NS::device_ptr<unsigned int>, celeritas::detail::IsNotInactive, int>::dispatch(thrust::THRUST_200500_800_NS::cuda_cub::execution_policy<thrust::THRUST_200500_800_NS::detail::execute_with_allocator<thrust::THRUST_200500_800_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200500_800_NS::cuda_cub::execute_on_stream_nosync_base> >&, void*, unsigned long&, unsigned int*, cub::CUB_200500_800_NS::NullType*, thrust::THRUST_200500_800_NS::device_ptr<unsigned int>, celeritas::detail::IsNotInactive, int, unsigned long&) [0x5a27cc]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/lib64/libceleritas.so
=========     Host Frame:unsigned long thrust::THRUST_200500_800_NS::cuda_cub::detail::partition<thrust::THRUST_200500_800_NS::detail::execute_with_allocator<thrust::THRUST_200500_800_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200500_800_NS::cuda_cub::execute_on_stream_nosync_base>, unsigned int*, cub::CUB_200500_800_NS::NullType*, thrust::THRUST_200500_800_NS::device_ptr<unsigned int>, celeritas::detail::IsNotInactive>(thrust::THRUST_200500_800_NS::cuda_cub::execution_policy<thrust::THRUST_200500_800_NS::detail::execute_with_allocator<thrust::THRUST_200500_800_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200500_800_NS::cuda_cub::execute_on_stream_nosync_base> >&, unsigned int*, unsigned int*, cub::CUB_200500_800_NS::NullType*, thrust::THRUST_200500_800_NS::device_ptr<unsigned int>, celeritas::detail::IsNotInactive) [0x5a93f3]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/lib64/libceleritas.so
=========     Host Frame:void celeritas::detail::(anonymous namespace)::partition_impl<celeritas::detail::IsNotInactive>(celeritas::Collection<unsigned int, (celeritas::Ownership)1, (celeritas::MemSpace)1, celeritas::OpaqueId<celeritas::Thread_, unsigned int> > const&, celeritas::detail::IsNotInactive&&, celeritas::OpaqueId<celeritas::Stream_, unsigned int>) [0x5a96db]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/lib64/libceleritas.so
=========     Host Frame:celeritas::detail::sort_tracks(celeritas::CoreStateData<(celeritas::Ownership)1, (celeritas::MemSpace)1> const&, celeritas::TrackOrder) [0x59f3be]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/lib64/libceleritas.so
=========     Host Frame:celeritas::SortTracksAction::step(celeritas::CoreParams const&, celeritas::CoreState<(celeritas::MemSpace)1>&) const [0x4cf49b]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/lib64/libceleritas.so
=========     Host Frame:void celeritas::ActionSequence::step<(celeritas::MemSpace)1>(celeritas::CoreParams const&, celeritas::CoreState<(celeritas::MemSpace)1>&) [0x4247a1]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/lib64/libceleritas.so
=========     Host Frame:celeritas::Stepper<(celeritas::MemSpace)1>::operator()() [0x43be89]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/lib64/libceleritas.so
=========     Host Frame:celeritas::test::TestTrackPartitionEm3Stepper_device_is_partitioned_Test::TestBody() [0x221d9]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/test/celeritas/track_TrackSort
=========     Host Frame:void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) in /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/GoogleTest/googletest/src/gtest.cc:2657 [0x55fbe]
=========                in /pscratch/sd/e/esseivaj/celer-athena/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::Test::Run() in /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/GoogleTest/googletest/src/gtest.cc:2686 [0x4607d]
=========                in /pscratch/sd/e/esseivaj/celer-athena/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::TestInfo::Run() in /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/GoogleTest/googletest/src/gtest.cc:2845 [0x46234]
=========                in /pscratch/sd/e/esseivaj/celer-athena/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::TestSuite::Run() in /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/GoogleTest/googletest/src/gtest.cc:2977 [0x46366]
=========                in /pscratch/sd/e/esseivaj/celer-athena/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::internal::UnitTestImpl::RunAllTests() in /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/GoogleTest/googletest/src/gtest.cc:5890 [0x4c0eb]
=========                in /pscratch/sd/e/esseivaj/celer-athena/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) in /pscratch/sd/e/esseivaj/celer-athena/externals_bu
ild/src/GoogleTest/googletest/src/gtest.cc:2657 [0x5643e]
=========                in /pscratch/sd/e/esseivaj/celer-athena/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::UnitTest::Run() in /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/GoogleTest/googletest/src/gtest.cc:5455 [0x464d5]
=========                in /pscratch/sd/e/esseivaj/celer-athena/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:celeritas::testdetail::test_main(int, char**) [0x44d25]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/test/libtestcel_harness.so
=========     Host Frame:__libc_start_call_main [0x2958f]
=========                in /lib64/libc.so.6
=========     Host Frame:__libc_start_main [0x2963f]
=========                in /lib64/libc.so.6
=========     Host Frame:_start [0x1ec34]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/test/celeritas/track_TrackSort

If I compile with nvcc -g -G ... I hit the stack overflow and launch failure. Without the -g -G, I hit the global read error.

pcanal · 2024-10-23T02:31:11Z

My build ended up with the following failures:

         58 - geocel/vg/Vecgeom:CmseTest.* (Failed)
        176 - celeritas/geo/Geometry (Failed)
        250 - celeritas/track/TrackSort (Failed)

(updated to remove test that failed due to an unrelated local mis-configuration resulting in not finding data files)
but the TrackSort failure is different

[ RUN      ] TestTrackSortActionIdEm3Stepper.device_is_sorted
unknown file: Failure
C++ exception with description "radix_sort: failed on 2nd step: cudaErrorInvalidValue: invalid argument" thrown in the test body.
Writing diagnostic output because test failed

amandalund · 2024-10-23T07:57:00Z

I've been seeing the same TrackSort failure for a while on A100 with cuda 11.6.2:

[ RUN      ] TestTrackSortActionIdEm3Stepper.device_is_sorted
status: Celeritas core setup complete
info: Executing actions with additional debug checking
status: Celeritas core state initialization complete
warning: Cuda API error detected: cudaLaunchKernel returned (0x1)

warning: Cuda API error detected: cudaPeekAtLastError returned (0x1)

warning: Cuda API error detected: cudaGetLastError returned (0x1)

unknown file: Failure
C++ exception with description "radix_sort: failed on 2nd step: cudaErrorInvalidValue: invalid argument" thrown in the test body.
Writing diagnostic output because test failed

sethrj · 2024-10-23T08:20:30Z

Digging into all this a little.... NVIDIA/cub#545 ?

pcanal · 2024-10-23T12:41:29Z

It does seems to be exactly that. In the build used for #1462 (comment), VecGeom was built with CUDA architecture 70 while Celeritas was built with 80. Rebuilding VecGeom with 80 makes celeritas/track/TrackSort work correctly (Celeritas build without VecGeom also works).

pcanal · 2024-10-23T15:10:49Z

Using multiple (consistent) architecture: CMAKE_CUDA_ARCHITECTURES="70;80" also works (confirmed), so I am not yet sure why it fails on lxplus.

sethrj · 2024-10-23T15:33:00Z

So back to the drawing board? 😞

pcanal · 2024-10-23T21:51:34Z

@esseivaju figured out that the lxplus libraries were not build as they were meant to be and were indeed using the wrong cuda architecture.

sethrj · 2024-10-23T22:45:14Z

Y'all are the best!!

esseivaju · 2024-10-24T00:43:12Z

Celeritas unit test no longer crash because of gpu-related issues, however athsimulation still does. This is the output of running athena.py through compute-sanitizer athena-compute-sanitizer.log. Note that AthSimulation was built with LTO disabled (-DATLAS_GEANT4_USE_LTO=OFF).

The compute-sanitizer output doesn't show a thrust kernel failing even though the exception reported by Celeritas suggests that it is:

EVNTtoHITS 01:03:57 GPUOffload           INFO GPUOffload: End of Event
EVNTtoHITS 01:03:57 LocalTransporter.cc:256: info: Transporting 994 tracks from event 1 with Celeritas
EVNTtoHITS 01:03:58
EVNTtoHITS 01:03:58 -------- EEEE ------- G4Exception-START -------- EEEE -------
EVNTtoHITS 01:03:58 *** G4Exception : celer0004
EVNTtoHITS 01:03:58       issued by : Thrust GPU library
EVNTtoHITS 01:03:58 parallel_for: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
EVNTtoHITS 01:03:58 *** Fatal Exception *** core dump ***
EVNTtoHITS 01:03:58 G4Track (0x9dbdcd10) - track ID = 2, parent ID = 1
EVNTtoHITS 01:03:58  Particle type : e- - creator process : hIoni, creator model : model_DeltaElectron
EVNTtoHITS 01:03:58  Kinetic energy : 176.852 keV - Momentum direction : (1.04445e-314,-0.861732,-0.255076)
EVNTtoHITS 01:03:58  Step length : 0 fm  - total energy deposit : 0 eV
EVNTtoHITS 01:03:58  Pre-step point : (-473.08,-154.837,396.919) - Physical volume : Atlas::Atlas (Air)
EVNTtoHITS 01:03:58  - defined by : not available
EVNTtoHITS 01:03:58  Post-step point : (-473.08,-154.837,396.919) - Physical volume : Atlas::Atlas (Air)
EVNTtoHITS 01:03:58  - defined by : not available
EVNTtoHITS 01:03:58  *** Note: Step information might not be properly updated.
EVNTtoHITS 01:03:58
EVNTtoHITS 01:03:58 -------- EEEE -------- G4Exception-END --------- EEEE -------

As mentioned on Slack, I don't think there is code generated for sm_52 architectures as the command below shows no match.

cuobjdump output

esseivaj@perlmutter:login01:/pscratch/sd/e/esseivaj/celer-athena> rg --include-zero -c -e 'sm_52' --pre cuobjdump -g '*.so' -g '*.a' athsim_build externals_build 2>/dev/null
externals_build/External/VecGeom/CMakeFiles/VecGeomBuild/lib/libvecgeomcuda.so:0
externals_build/src/Celeritas-build/lib/liborange.a:0
externals_build/src/Celeritas-build/lib/libgeocel.a:0
externals_build/External/VecGeom/CMakeFiles/VecGeomBuild/lib/libvecgeomcuda_static.a:0
externals_build/src/Celeritas-build/lib/libcorecel.a:0
externals_build/src/Celeritas-build/lib/libcorecel_final.a:0
externals_build/src/Celeritas-build/lib/libgeocel_final.a:0
externals_build/src/Celeritas-build/lib/libceleritas_final.a:0
externals_build/External/Celeritas/CMakeFiles/CeleritasBuild/lib/liborange.a:0
externals_build/External/Celeritas/CMakeFiles/CeleritasBuild/lib/libgeocel.a:0
externals_build/External/Celeritas/CMakeFiles/CeleritasBuild/lib/libcorecel.a:0
externals_build/External/Celeritas/CMakeFiles/CeleritasBuild/lib/libcorecel_final.a:0
externals_build/External/Celeritas/CMakeFiles/CeleritasBuild/lib/libgeocel_final.a:0
externals_build/External/Celeritas/CMakeFiles/CeleritasBuild/lib/libceleritas_final.a:0
externals_build/src/Celeritas-build/lib/libceleritas.a:0
externals_build/src/Celeritas-build/lib/liborange_final.a:0
externals_build/src/Celeritas-build/lib/libaccel.a:0
externals_build/src/Celeritas-build/lib/libaccel_final.a:0
externals_build/src/VecGeom-build/libvecgeomcuda.so:0
externals_build/External/Celeritas/CMakeFiles/CeleritasBuild/lib/libaccel.a:0
externals_build/External/Celeritas/CMakeFiles/CeleritasBuild/lib/liborange_final.a:0
externals_build/External/Celeritas/CMakeFiles/CeleritasBuild/lib/libaccel_final.a:0
externals_build/External/Celeritas/CMakeFiles/CeleritasBuild/lib/libceleritas.a:0
externals_build/src/VecGeom-build/libvecgeomcuda_static.a:0
athsim_build/x86_64-el9-gcc13-opt/lib/libAtlasGeant4Lib.so:0
athsim_build/x86_64-el9-gcc13-opt/lib/libAtlasGeant4.so:0
externals_build/x86_64-el9-gcc13-opt/lib/libvecgeomcuda.so:0
externals_build/x86_64-el9-gcc13-opt/lib/libaccel.a:0
externals_build/x86_64-el9-gcc13-opt/lib/libaccel_final.a:0
externals_build/x86_64-el9-gcc13-opt/lib/liborange_final.a:0
externals_build/x86_64-el9-gcc13-opt/lib/libceleritas_final.a:0
externals_build/x86_64-el9-gcc13-opt/lib/libvecgeomcuda_static.a:0
externals_build/x86_64-el9-gcc13-opt/lib/libcorecel.a:0
externals_build/x86_64-el9-gcc13-opt/lib/libgeocel_final.a:0
externals_build/x86_64-el9-gcc13-opt/lib/libcorecel_final.a:0
externals_build/x86_64-el9-gcc13-opt/lib/libgeocel.a:0
externals_build/x86_64-el9-gcc13-opt/lib/libceleritas.a:0
externals_build/x86_64-el9-gcc13-opt/lib/liborange.a:0

sethrj · 2024-10-24T08:04:03Z

@esseivaju @pcanal @drbenmorgan Have we tried building with -DCELERITAS_DEVICE_DEBUG=ON?

Also I remembered with #1433 we're actually not defaulting to using the same settings for GPU/CPU: perhaps we should use setup.track_order = TrackOrder::init_charge for the CPU case and see if we reproduce the failure?

amandalund · 2024-10-24T08:35:01Z

I've also sometimes seen errors like that (CUDA error: an illegal memory access was encountered) when running on more complex geometries without increasing the CUDA heap/stack size enough.

sethrj · 2024-10-24T08:50:01Z

That's a good point, we ought to try running this geometry in standalone to see.

esseivaju · 2024-10-24T09:19:59Z

That the GDML you dump during the hackathon right? We should be able to quickly test that

sethrj · 2024-10-24T10:00:16Z

Yeah I've uploaded it and just ran on wildstyle with the regression harness; seems to work fine 👀

esseivaju · 2024-10-25T00:14:01Z

I tried turning on -DCELERITAS_DEVICE_DEBUG=ON or setting setup.track_order = TrackOrder::none but it doesn't make a difference. compute-sanitizer still reports the same global read error and weird exceptions such as:

EVNTtoHITS 23:56:14 -------- EEEE ------- G4Exception-START -------- EEEE -------
EVNTtoHITS 23:56:14 *** G4Exception : celer0004
EVNTtoHITS 23:56:14       issued by : /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas/src/corecel/sys/KernelAttributes.hh:75
EVNTtoHITS 23:56:14 Celeritas CUDA error: an illegal memory access was encountered

*** G4Exception : celer0004
      issued by : /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas/src/corecel/sys/KernelAttributes.hh:75
Celeritas CUDA error: unspecified launch failure

I also tried the regression problem, and they work fine but they complete much faster than the old tilecal with 2 modules we were using before.

Something must be still wrong with the build with Athena...

sethrj · 2024-10-25T04:52:38Z

Damn. Thanks Julien, this is a mystery indeed...

esseivaju · 2024-10-25T23:07:16Z

No luck with increasing the stack and heap size. I ran athena through cuda-gdb and that's the output

CUDA error:

CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x963ab9b0  __nv_static_53__016fd2f1_25_InitializeTracksAction_cu_88c0b313_98394__ZN9celeritas6detail64_GLOBAL__N__016fd2f1_25_InitializeTracksAction_cu_88c0b313_9839418launch_action_implINS0_18InitTracksExecutorELb1EEEvNS_5RangeINS_8OpaqueIdINS_7Thread_EjEEEET_

Thread 1 "athena.py" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 22, grid 185102, block (0,0,0), thread (224,0,0), device 0, sm 0, warp 7, lane 0]
0x00000000963ab9c0 in __nv_static_53__016fd2f1_25_InitializeTracksAction_cu_88c0b313_98394__ZN9celeritas6detail64_GLOBAL__N__016fd2f1_25_InitializeTracksAction_cu_88c0b313_9839418launch_action_implINS0_18InitTracksExecutorELb1EEEvNS_5RangeINS_8OpaqueIdINS_7Thread_EjEEEET_<<<(4,1,1),(256,1,1)>>> ()

Host stack trace and exception: (it fails on the first CUDA runtime API call after the failing kernel so it's not very important)

EVNTtoHITS 22:57:50 -------- EEEE ------- G4Exception-START -------- EEEE -------
EVNTtoHITS 22:57:50 *** G4Exception : celer0004
EVNTtoHITS 22:57:50       issued by : Thrust GPU library
EVNTtoHITS 22:57:50 parallel_for: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered

#9  0x00007fffceda2f45 in libcudart_static_141dba5462e92d2cffd1abc474df476c510a3a8c () from /pscratch/sd/e/esseivaj/celer-athena/athsim_build/x86_64-el9-gcc13-opt/lib/libAtlasGeant4.so
#10 0x00007fffcee07ddd in cudaStreamSynchronize () from /pscratch/sd/e/esseivaj/celer-athena/athsim_build/x86_64-el9-gcc13-opt/lib/libAtlasGeant4.so
#11 0x00007fffced9be19 in cub::CUB_200500_800_NS::SyncStream (stream=0x1) at /usr/local/cuda/include/cub/util_device.cuh:503
#12 thrust::THRUST_200500_800_NS::cuda_cub::synchronize_stream<thrust::THRUST_200500_800_NS::cuda_cub::par_t> (policy=...) at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:100
#13 thrust::THRUST_200500_800_NS::cuda_cub::synchronize_stream_optional<thrust::THRUST_200500_800_NS::cuda_cub::par_t> (policy=...) at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:119
#14 thrust::THRUST_200500_800_NS::cuda_cub::synchronize_optional<thrust::THRUST_200500_800_NS::cuda_cub::execution_policy<thrust::THRUST_200500_800_NS::cuda_cub::par_t> > (policy=...) at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:133
#15 thrust::THRUST_200500_800_NS::cuda_cub::parallel_for<thrust::THRUST_200500_800_NS::cuda_cub::par_t, thrust::THRUST_200500_800_NS::cuda_cub::__fill::functor<thrust::THRUST_200500_800_NS::device_ptr<celeritas::OpaqueId<celeritas::TrackSlot_, unsigned int> >, celeritas::OpaqueId<celeritas::TrackSlot_, unsigned int> >, unsigned long> (policy=..., f=..., count=1024) at /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h:63
#16 thrust::THRUST_200500_800_NS::cuda_cub::fill_n<thrust::THRUST_200500_800_NS::cuda_cub::par_t, thrust::THRUST_200500_800_NS::device_ptr<celeritas::OpaqueId<celeritas::TrackSlot_, unsigned int> >, unsigned long, celeritas::OpaqueId<celeritas::TrackSlot_, unsigned int> > (policy=..., value=..., count=1024, first=...) at /usr/local/cuda/include/thrust/system/cuda/detail/fill.h:77
#17 thrust::THRUST_200500_800_NS::fill_n<thrust::THRUST_200500_800_NS::cuda_cub::par_t, thrust::THRUST_200500_800_NS::device_ptr<celeritas::OpaqueId<celeritas::TrackSlot_, unsigned int> >, unsigned long, celeritas::OpaqueId<celeritas::TrackSlot_, unsigned int> > (
    value=..., n=1024, first=..., exec=...) at /usr/local/cuda/include/thrust/detail/fill.inl:55
#18 celeritas::Filler<celeritas::OpaqueId<celeritas::TrackSlot_, unsigned int>, (celeritas::MemSpace)1>::fill_device_impl (this=this@entry=0x7fffffff5d88, data=...)
    at /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas/src/corecel/data/Filler.device.t.hh:36
#19 0x00007fffceb287d1 in celeritas::Filler<celeritas::OpaqueId<celeritas::TrackSlot_, unsigned int>, (celeritas::MemSpace)1>::operator() (data=..., this=0x7fffffff5d88)
    at /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas/src/corecel/data/Filler.hh:49
#20 celeritas::fill<celeritas::OpaqueId<celeritas::TrackSlot_, unsigned int>, (celeritas::Ownership)1, (celeritas::MemSpace)1, celeritas::OpaqueId<celeritas::TrackSlot_, unsigned int> > (value=..., col=0x7da5c690)
    at /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas/src/corecel/data/CollectionAlgorithms.hh:33
#21 celeritas::InitializeTracksAction::step_impl<(celeritas::MemSpace)1> (core_state=..., core_params=..., this=0x91b87670) at /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas/src/celeritas/track/InitializeTracksAction.cc:91
#22 celeritas::InitializeTracksAction::step (this=0x91b87670, params=..., state=...) at /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas/src/celeritas/track/InitializeTracksAction.cc:43

instruction causing the crash (multiply and add):

(cuda-gdb)  disas $pc,+16
Dump of assembler code from 0x963ab9c0 to 0x963ab9d0:
=> 0x00000000963ab9c0 <__nv_static_53__016fd2f1_25_InitializeTracksAction_cu_88c0b313_98394__ZN9celeritas6detail64_GLOBAL__N__016fd2f1_25_InitializeTracksAction_cu_88c0b313_9839418launch_action_implINS0_18InitTracksExecutorELb1EEEvNS_5RangeINS_8OpaqueIdINS_7Thread_EjEEEET_+4160>:      IMAD.WIDE.U32 R2, R17, 0x4, R10

I haven't linked the instruction to the line of code yet... Since this block should be executing on the first step and it's successful, maybe there is an actual bug in InitTracksExecutor?

sethrj · 2024-10-26T12:01:53Z

Gosh. Next steps: can we get ROOT (or even hepmc3?) enabled so we can write out the primaries? Can we try testing with ORANGE?

esseivaju · 2024-10-26T17:57:35Z

ORANGE fails at

/pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas/src/orange/OrangeTrackView.hh:1160:
celeritas: internal assertion failed: lev < this->level()

in the MscApplier kernel.

sethrj · 2024-10-28T10:30:39Z

Well that's not good either...

sethrj · 2024-10-29T11:40:39Z

@esseivaju reports a failure with ORANGE and the Bpip geometry but at least we get one event or so:

EVNTtoHITS 18:56:46 AthenaEventLoopMgr   INFO   ===>>>  start processing event #1, run #999999 0 events processed so far  <<<===
EVNTtoHITS 18:56:50 GPUOffload           INFO GPUOffload: Begin of Event 1 tid = -1
EVNTtoHITS 18:56:50 GPUOffload           INFO GPUOffload: End of Event
EVNTtoHITS 18:56:50 LocalTransporter.cc:256: info: Transporting 13 tracks from event 1 with Celeritas
EVNTtoHITS 18:56:50 AthenaEventLoopMgr   INFO   ===>>>  done processing event #1, run #999999 1 events processed so far  <<<===
EVNTtoHITS 18:56:50 AthenaEventLoopMgr   INFO   ===>>>  start processing event #2, run #999999 1 events processed so far  <<<===
EVNTtoHITS 18:56:50 GPUOffload           INFO GPUOffload: Begin of Event 2 tid = -1
EVNTtoHITS 18:56:50 GPUOffload           INFO GPUOffload: End of Event
EVNTtoHITS 18:56:50 LocalTransporter.cc:256: info: Transporting 16 tracks from event 2 with Celeritas
EVNTtoHITS 18:56:51 /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas/src/orange/OrangeTrackView.hh:1160:
EVNTtoHITS 18:56:51 celeritas: internal assertion failed: lev < this->level()
EVNTtoHITS 18:56:51
EVNTtoHITS 18:56:51 -------- EEEE ------- G4Exception-START -------- EEEE -------
EVNTtoHITS 18:56:51 *** G4Exception : celer0004
EVNTtoHITS 18:56:51       issued by : /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas/src/celeritas/global/ActionSequence.cc:112
EVNTtoHITS 18:56:51 Celeritas CUDA error: unspecified launch failure

@esseivaju can you verify whether vecgeom (debug off) works with the beampipe?

esseivaju · 2024-10-29T17:11:11Z

I still see CUDA memory-related issues with the Bpipe+VecGeom. I also have a dump of the Bpipe Gdml that I'll send on Slack since I can't push to https://gitlab.cern.ch/bmorgan/atlassim-6635/-/tree/main?ref_type=heads

sethrj · 2024-10-30T10:52:58Z

Does ATLAS have a "pretend everything is a solid block of aluminum" option??? Are there any weird potential "thread local" behavior outside of the usual geant4 that could be causing issues?

esseivaju · 2024-10-31T00:51:43Z

Does ATLAS have a "pretend everything is a solid block of aluminum" option???

Nothing simpler than the beam pipe we're already using.

Are there any weird potential "thread local" behavior outside of the usual geant4 that could be causing issues?

Atlas has a lot of custom code for interfacing with Geant4. We're running single-threaded here so thread_local shouldn't be causing issues

sethrj · 2024-10-31T13:24:21Z

From @esseivaju regarding @amandalund 's suggestion to run a single track slot, still getting a failure on the very first step:

EVNTtoHITS 00:38:59 GPUOffload           INFO GPUOffload: Begin of Event 1 tid = -1
EVNTtoHITS 00:38:59 LocalTransporter.cc:257: info: Transporting 1 tracks (4.76501 MeV cumulative kinetic energy) from event 1 with Celeritas
EVNTtoHITS 00:39:00
EVNTtoHITS 00:39:00 -------- EEEE ------- G4Exception-START -------- EEEE -------
EVNTtoHITS 00:39:00 *** G4Exception : celer0003
EVNTtoHITS 00:39:00       issued by : celeritas/global/ActionSequence.cc:112
========= Invalid __global__ read of size 8 bytes
=========     at void celeritas::detail::<unnamed>::launch_action_impl<celeritas::detail::InitTracksExecutor, (bool)1>(celeritas::Range<celeritas::OpaqueId<celeritas::Thread_, unsigned int>>, T1)+0x6af0
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x18 is out of bounds
=========     and is 8732540904 bytes before the nearest allocation at 0x208800000 of size 10027264 bytes

This has got to be some sort of linking issue...

sethrj · 2024-11-01T12:54:43Z

SASS looks effectively the same between standalone/Celeritas ATLAS:

esseivaj@perlmutter:login26:/pscratch/sd/e/esseivaj/celer-athena> diff -EZbw --suppress-common-lines inittracks_standalone inittracks_athsim
1c1
<                 Function : __nv_static_54__7c5e5389_25_InitializeTracksAction_cu_1d15004e_575166__ZN9celeritas6detail65_GLOBAL__N__7c5e5389_25_InitializeTracksAction_cu_1d15004e_57516618launch_action_implINS0_18InitTracksExecutorELb1EEEvNS_5RangeINS_8OpaqueIdINS_7Thread_EjEEEET_
---
>                 Function : __nv_static_54__7c5e5389_25_InitializeTracksAction_cu_fe2be81f_973355__ZN9celeritas6detail65_GLOBAL__N__7c5e5389_25_InitializeTracksAction_cu_fe2be81f_97335518launch_action_implINS0_18InitTracksExecutorELb1EEEvNS_5RangeINS_8OpaqueIdINS_7Thread_EjEEEET_
29c29
<         /*00d0*/                   ULDC.64 UR4, c[0x0][0x0] ;                        /* 0x0000000000047ab9 */
---
>         /*00d0*/                   ULDC.64 UR4, c[0x2][0x0] ;                                        /* 0x0080000000047ab9 */

taken from celeritas/build-ndebug/lib64/libceleritas_static.a and athsim_build/x86_64-el9-gcc13-opt/lib/libAtlasGeant4.so.

Build with shared libraries seems to fail due to the "split" RDC libraries:

[7064/7070] Generating ../../x86_64-el9-gcc13-opt/lib/libAtlasGeant4.components
FAILED: x86_64-el9-gcc13-opt/lib/libAtlasGeant4.components /pscratch/sd/e/esseivaj/celer-athena/athsim_build/x86_64-el9-gcc13-opt/lib/libAtlasGeant4.components
cd /pscratch/sd/e/esseivaj/celer-athena/athsim_build/Simulation/AtlasGeant4 && /pscratch/sd/e/esseivaj/celer-athena/athsim_build/CMakeFiles/atlas_build_run.sh /pscratch/sd/e/esseivaj/celer-athena/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/bin/listcomponents --output /pscratch/sd/e/esseivaj/celer-athena/athsim_build/x86_64-el9-gcc13-opt/lib/libAtlasGeant4.components libAtlasGeant4.so
/pscratch/sd/e/esseivaj/celer-athena/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/bin/listcomponents: symbol lookup error: /pscratch/sd/e/esseivaj/celer-athena/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/lib/libcorecel.so: undefined symbol: __cudaRegisterLinkedBinary_d0380511_9_Filler_cu_b666e28f_701951

seems like this is a postprocessing tool to check the libraries for missing symbols (like -Wl,-z,defs).

sethrj · 2024-11-01T20:10:15Z

@esseivaju traced the failure to the first access of member data inside the BVH class in the BVH navigator:

	//## File "/pscratch/sd/e/esseivaj/celer-athena/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/include/VecGeom/base/BVH.h", line 264 inlined at "/pscratch/sd/e/esseivaj/celer-athena/celeritas/src/geocel/vg/detail/BVHNavigator.hh", line 65
	//## File "/pscratch/sd/e/esseivaj/celer-athena/celeritas/src/geocel/vg/detail/BVHNavigator.hh", line 65 inlined at "/pscratch/sd/e/esseivaj/celer-athena/celeritas/src/geocel/vg/VecgeomTrackView.hh", line 228
	//## File "/pscratch/sd/e/esseivaj/celer-athena/celeritas/src/geocel/vg/VecgeomTrackView.hh", line 228 inlined at "/pscratch/sd/e/esseivaj/celer-athena/celeritas/src/celeritas/track/detail/InitTracksExecutor.hh", line 137
	//## File "/pscratch/sd/e/esseivaj/celer-athena/celeritas/src/celeritas/track/detail/InitTracksExecutor.hh", line 137 inlined at "/pscratch/sd/e/esseivaj/celer-athena/celeritas/src/corecel/sys/detail/KernelLauncherImpl.device.hh", line 36
	//## File "/pscratch/sd/e/esseivaj/celer-athena/celeritas/src/corecel/sys/detail/KernelLauncherImpl.device.hh", line 36 inlined at "/pscratch/sd/e/esseivaj/celer-athena/celeritas/src/corecel/sys/detail/KernelLauncherImpl.device.hh", line 48
	//## File "/pscratch/sd/e/esseivaj/celer-athena/celeritas/src/corecel/sys/detail/KernelLauncherImpl.device.hh", line 48
        /*1030*/                   LD.E.64 R10, [R38.64+0x18] ;
        /*1040*/                   IMAD.WIDE.U32 R2, R17, 0x4, R10 ;
        /*1050*/                   LD.E R2, [R2.64] ;
        /*1060*/                   BSSY B8, `(.L_x_11) ;
        /*1070*/                   ISETP.GT.AND P0, PT, R2, -0x1, PT ;
        /*1080*/               @P0 BRA `(.L_x_12) ;

This suggests it's related to the BVH manager's store,

namespace vecgeom {
inline namespace VECGEOM_IMPL_NAMESPACE {
inline std::vector<BVH *> hBVH;
#ifdef VECGEOM_ENABLE_CUDA
inline __device__ BVH *dBVH;
#endif
class BVHManager {
public:
  VECCORE_ATT_HOST_DEVICE
  static BVH const *GetBVH(int id)
  {
#ifdef VECCORE_CUDA_DEVICE_COMPILATION
    return &cuda::dBVH[id];
#else
    return hBVH[id];
#endif
  }

so there's inline __device__ data being shared across translation units (and library boundaries).

Possibly related documentation:

CUDA docs for inline variables say it's OK to have inline device variables in RDC mode
LLVM discussion of ELF issues with device inline from Reduce string duplication from assertion messages #1338

sethrj · 2024-11-03T12:59:06Z

Just to clarify: the failure occurs in vecgeom::cuda::BVH::LevelLocate at the first access of a data member, fNChild[0], where &fNChild - this == 0x18 (three 8-byte pointers precede it in the class definition). The failing error is Address 0x18 is out of bounds and is occurring at LD.E.64 R10, [R38.64+0x18] which means that the temporary R38.64 is a null pointer. So conclusively cuda::dBVH is null for reasons unknown, only when compiling through ATLAS.

sethrj · 2024-11-04T18:07:03Z

Confirmed by @esseivaju with #1481:

EVNTtoHITS 16:47:14 /pscratch/sd/e/esseivaj/celer-athena/celeritas/src/geocel/vg/VecgeomParams.cc:482: error: VecGeom CUDA may not be correctly linked or initialized (BVH device pointers are null or inconsistent: nullptr from Celeritas device kernel, nullptr from Celeritas runtime symbol, unavailable from VecGeom runtime symbol)

The change to use inline __device__ pointers rather than device functions was performed in 683fa167 and 8bb8eeeb by @bernhardmgruber... perhaps we need to bring in the Nvidia big guns to help us debug why the inline device symbols aren't working across translation units, and why the underlying device pointers are failing only in the ATLAS framework.

esseivaju · 2024-11-04T19:37:32Z

I've collected the various compiler and linker invocation from build logs here.

esseivaju · 2024-11-11T19:43:50Z

I have a build of AthSimulation working with Celeritas GPU offload. Required changes are:

Fix linking Athena with VecGeom+CUDA #1489
VecGeom CMake patch from @pcanal to only build static library instead of always building a final vecgeomcuda shared lib
Remove Geant4's VecGeom dependency in AtlasExternals from -DGEANT4_USE_USOLIDS:STRING=CONS;POLYCONE. This causes Geant4 to pick up the final libvecgeomcuda.a libraries which conflicts in the final device link step of libAtlasG4.so (temporary hack)

sethrj · 2024-11-12T12:50:17Z

To preserve the history of how Philippe and Julien figured this out:

@pcanal Thursday at 20:15
It could still be an link line order issue. What is the failing link line (the list an order of libraries is what’s important)?

@esseivaju Thursday at 21:32

FAILED: CMakeFiles/geocel_final.dir/cmake_device_link.o
/usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/cvmfs/sft.cern.ch/lcg/releases/gcc/13.1.0-b3d18/x86_64-el9/bin/g++ -O2 -g -DNDEBUG "--generate-code=arch=compute_80,code=[compute_80,sm_80]" /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas-build/lib/libgeocel.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas-build/lib/libcorecel.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libvecgeomcuda_static.a -Xnvlink --suppress-stack-size-warning -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink src/geocel/CMakeFiles/geocel_final.dir/CMakeFiles/geocel_emptyfile.cu.o -o CMakeFiles/geocel_final.dir/cmake_device_link.o  lib/libgeocel.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4persistency.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4run.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4event.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4tracking.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4processes.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4digits_hits.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4analysis.a  /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4track.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4geometry.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4graphics_reps.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libvecgeomcuda.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4materials.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4zlib.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4particles.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4intercoms.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4global.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4ptl.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libvgdml.a lib/libcorecel.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libvecgeomcuda_static.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libvecgeom.a -lrt -lpthread -ldl -lcudadevrt -lcudart
nvlink error   : Undefined reference to '_ZN7vecgeom4cuda4dBVHE' in '/pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas-build/lib/libgeocel.a:RaytraceImager.cu.o'

@pcanal Thursday at 21:32
Try:

nm -A CMakeFiles/geocel_final.dir/cmake_device_link.o  lib/libgeocel.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4persistency.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4run.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4event.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4tracking.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4processes.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4digits_hits.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4analysis.a  /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4track.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4geometry.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4graphics_reps.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libvecgeomcuda.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4materials.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4zlib.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4particles.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4intercoms.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4global.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4ptl.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libvgdml.a lib/libcorecel.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libvecgeomcuda_static.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libvecgeom.a | grep _ZN7vecgeom4cuda4dBVHE

@esseivaju Thursday at 21:36

nm: 'CMakeFiles/geocel_final.dir/cmake_device_link.o': No such file
lib/libgeocel.a:VecgeomSetup.cu.o:                 U _ZN7vecgeom4cuda4dBVHE
/pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libvecgeomcuda.a:BVHManager.cu.o:0000000000000010 B _ZN7vecgeom4cuda4dBVHE
/pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libvecgeomcuda_static.a:BVHManager.cu.o:0000000000000010 B _ZN7vecgeom4cuda4dBVHE

@pcanal Thursday at 21:40
this is indeed odd ;(

@esseivaju Thursday at 21:48

For reference, here’s the diff causing that problem:

diff --git a/VecGeom/management/BVHManager.h b/VecGeom/management/BVHManager.h
index c760ec2d7..d3a061dd1 100644
--- a/VecGeom/management/BVHManager.h
+++ b/VecGeom/management/BVHManager.h
@@ -17,7 +17,7 @@ namespace vecgeom {
 inline namespace VECGEOM_IMPL_NAMESPACE {
 inline std::vector<BVH *> hBVH;
 #ifdef VECGEOM_ENABLE_CUDA
-inline __device__ BVH *dBVH;
+extern __device__ BVH *dBVH;
 #endif

 // Macro allowing downstream codes to use GetDeviceBVH
diff --git a/source/BVHManager.cu b/source/BVHManager.cu
index 8c03d103b..2a740594a 100644
--- a/source/BVHManager.cu
+++ b/source/BVHManager.cu
@@ -10,6 +10,9 @@ using vecgeom::cxx::CudaCheckError;

 namespace vecgeom {
 inline namespace cuda {
+
+__device__ BVH *dBVH;
+
 void *AllocateDeviceBVHBuffer(size_t n)
 {
   BVH *ptr = nullptr;

@esseivaju Friday at 18:29

libvecgeomcuda.a is a final library (i.e. includes a dlink step) and libvecgeomcuda_static.a doesn’t include a dlink step. Isn’t that a problem if libvecgeomcuda.a is linked to libgeocel_final.a which also includes a dlink step? (edited)

@esseivaju Friday at 19:21

Similarly, libvecgeomcuda.a is included when dlinking the final atlas shared lib, my understanding is that we should only link to libvecgeomcuda_static.a

@pcanal Friday at 20:01

Yes, I think you are right. This is what I am trying to address with the first commit of https://github.com/celeritas-project/celeritas/pull/1489/commits but despites all the other (then necessary) commits, I will have some issues when shared library are also part of the mix.

A priori that first commit could solve the problem (in conjunction with the other commit, I think the fully static build works). (but the fully shared is broken 😞 ).

@esseivaju Friday at 20:04

I am using that PR when building Celeritas and AthSimulation. I still see libvecgeomcuda.a (and libvecgeomcuda_static.a) being picked up by the final atlas lib. Everything links successfully but we still have the kernel crash (edited)

@pcanal Friday at 20:38
I assume that Atlas as its own handling of the Rdc (I.e. I can’t be using the modified cmake fragment in Celeritas) and so their own fragment may need tweaking to avoid link against both libvecgeomcuda.a and libvecgeomcuda_static.a

@esseivaju Friday at 21:01

It might come from Geant4 since they depend on VecGeom and would need the final library :thinking_face:

@esseivaju Saturday at 01:29

That was it! Removing -DGEANT4_USE_USOLIDS:STRING=CONS;POLYCONE when compiling Geant4 removes the VecGeom dependency and no longer pulls in libvecgeomcuda.a so it’s able to run a step on GPU now with #1489. It fails shortly after in the sensitive detectors but at least the GPU works. Without SD, the 1000 events finish without crash

05:20:19 LocalTransporter.cc:257: info: Transporting 611 tracks (21870.4 MeV cumulative kinetic energy) from event 1 with Celeritas
05:20:19 HitProcessor.cc:188: debug: Processing 12 hits
05:20:20  0x7f2af824c62c TileGeoG4SDCalc::BirkLaw(G4Step const*) const /pscratch/sd/e/esseivaj/celer-athena/athena/TileCalorimeter/TileG4/TileGeoG4SD/src/TileGeoG4SDCalc.cc:882:89   + 0x5c [/pscratch/sd/e/esseivaj/celer-athena/athsim_build/x86_64-el9-gcc13-opt/lib/libAtlasGeant4.so D[0x131c62c]]
05:20:21  0x7f2af824df15 TileGeoG4SDCalc::MakePmtEdepTime(G4Step const*, TileHitData&, double&) const /pscratch/sd/e/esseivaj/celer-athena/athena/TileCalorimeter/TileG4/TileGeoG4SD/src/TileGeoG4SDCalc.cc:441:59   + 0x3b5 [/pscratch/sd/e/esseivaj/celer-athena/athsim_build/x86_64-el9-gcc13-opt/lib/libAtlasGeant4.so D[0x131df15]]
05:20:21  0x7f2af82477a2 TileGeoG4SD::ProcessHits(G4Step*, G4TouchableHistory*) /pscratch/sd/e/esseivaj/celer-athena/athena/TileCalorimeter/TileG4/TileGeoG4SD/src/TileGeoG4SD.cc:68:3   + 0x82 [/pscratch/sd/e/esseivaj/celer-athena/athsim_build/x86_64-el9-gcc13-opt/lib/libAtlasGeant4.so D[0x13177a2]]
05:20:21  0x7f2af911a821 celeritas::detail::HitProcessor::operator()(celeritas::DetectorStepOutput const&) const /pscratch/sd/e/esseivaj/celer-athena/celeritas/src/accel/detail/HitProcessor.cc:193:35   + 0x7f2af911a821 [/pscratch/sd/e/esseivaj/celer-athena/athsim_build/x86_64-el9-gcc13-opt/lib/libAtlasGeant4.so D[0x21ea821]]
(edited)

sethrj · 2024-11-14T13:24:37Z

Next steps for @drbenmorgan are to rebuild atlas externals:

Use the pending VecGeom update
Disable vecgeom/usolids in Geant4 (and we should try understand if/when they are used)
Update Celeritas to the latest develop

drbenmorgan · 2024-11-14T13:28:48Z

Thanks @sethrj, I'll get these done by early next week, pending the full VecGeom fix, but I can get other bits in place.

esseivaju · 2024-11-20T18:35:41Z

Can we close this now that #1489 and the VecGeom patch are merged?

pcanal self-assigned this Oct 22, 2024

sethrj mentioned this issue Oct 23, 2024

ATLAS integration #1144

Open

13 tasks

sethrj mentioned this issue Oct 23, 2024

Improve error checking and diagnostic output for managing GPU #1464

Merged

sethrj changed the title ~~Apparent incompatibility with current ATLAS GPU environement~~ Apparent incompatibility with current ATLAS GPU environment Oct 24, 2024

sethrj added bug Something isn't working external Dependencies and framework-oriented features labels Nov 1, 2024

sethrj mentioned this issue Nov 1, 2024

Use kernel launchers instead of macros #1477

Merged

sethrj mentioned this issue Nov 4, 2024

Check VecGeom BVH device pointers at setup and run time #1481

Merged

esseivaju linked a pull request Nov 20, 2024 that will close this issue

Fix linking Athena with VecGeom+CUDA #1489

Merged

sethrj closed this as completed Nov 20, 2024

Apparent incompatibility with current ATLAS GPU environment #1462

Apparent incompatibility with current ATLAS GPU environment #1462

Comments

pcanal commented Oct 22, 2024 • edited Loading

pcanal commented Oct 22, 2024

esseivaju commented Oct 22, 2024

pcanal commented Oct 22, 2024

esseivaju commented Oct 22, 2024 • edited Loading

sethrj commented Oct 22, 2024

esseivaju commented Oct 22, 2024

sethrj commented Oct 22, 2024

esseivaju commented Oct 22, 2024

esseivaju commented Oct 23, 2024 • edited Loading

pcanal commented Oct 23, 2024 • edited Loading

amandalund commented Oct 23, 2024

sethrj commented Oct 23, 2024

pcanal commented Oct 23, 2024

pcanal commented Oct 23, 2024 • edited Loading

sethrj commented Oct 23, 2024

pcanal commented Oct 23, 2024

sethrj commented Oct 23, 2024

esseivaju commented Oct 24, 2024 • edited Loading

sethrj commented Oct 24, 2024

amandalund commented Oct 24, 2024

sethrj commented Oct 24, 2024

esseivaju commented Oct 24, 2024

sethrj commented Oct 24, 2024

esseivaju commented Oct 25, 2024 • edited Loading

sethrj commented Oct 25, 2024

esseivaju commented Oct 25, 2024 • edited Loading

sethrj commented Oct 26, 2024

esseivaju commented Oct 26, 2024

sethrj commented Oct 28, 2024

sethrj commented Oct 29, 2024

esseivaju commented Oct 29, 2024

sethrj commented Oct 30, 2024

esseivaju commented Oct 31, 2024

sethrj commented Oct 31, 2024

sethrj commented Nov 1, 2024

sethrj commented Nov 1, 2024 • edited Loading

sethrj commented Nov 3, 2024

sethrj commented Nov 4, 2024

esseivaju commented Nov 4, 2024

esseivaju commented Nov 11, 2024 • edited Loading

sethrj commented Nov 12, 2024

sethrj commented Nov 14, 2024

drbenmorgan commented Nov 14, 2024

esseivaju commented Nov 20, 2024

pcanal commented Oct 22, 2024 •

edited

Loading

esseivaju commented Oct 22, 2024 •

edited

Loading

esseivaju commented Oct 23, 2024 •

edited

Loading

pcanal commented Oct 23, 2024 •

edited

Loading

pcanal commented Oct 23, 2024 •

edited

Loading

esseivaju commented Oct 24, 2024 •

edited

Loading

esseivaju commented Oct 25, 2024 •

edited

Loading

esseivaju commented Oct 25, 2024 •

edited

Loading

sethrj commented Nov 1, 2024 •

edited

Loading

esseivaju commented Nov 11, 2024 •

edited

Loading