Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apparent incompatibility with current ATLAS GPU environment #1462

Closed
Tracked by #1144
pcanal opened this issue Oct 22, 2024 · 44 comments · Fixed by #1489
Closed
Tracked by #1144

Apparent incompatibility with current ATLAS GPU environment #1462

pcanal opened this issue Oct 22, 2024 · 44 comments · Fixed by #1489
Assignees
Labels
bug Something isn't working external Dependencies and framework-oriented features

Comments

@pcanal
Copy link
Contributor

pcanal commented Oct 22, 2024

When running on lxplus-gpu we observed a crash within one of Celeritas test checking Thrust functionality.

With Celeritas using for example commit fe52da7 or commit aab88dd
and

export G4PATH=/cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/Geant4
export ATLAS_LOCAL_ROOT_BASE="/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase"
source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh

asetup AthSimulation,local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt,2023-10-15T1800

cmake -DVDT_ROOT=/cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/sw/lcg/releases/LCG_106_ATLAS_13/vdt/0.4.4/x86_64-el9-gcc13-opt/ -DCELERITAS_USE_Geant4=OFF -DCELERITAS_BUILD_TESTS=ON -DCELERITAS_USE_CUDA=ON -DCELERITAS_USE_ROOT=ON -DCMAKE_CUDA_ARCHITECTURES=75 -G Ninja -DCMAKE_CUDA_FLAGS="-g -G" ../celeritas

we get a failure in test/celeritas/track_TrackSort

status: Celeritas core state initialization complete

CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x7fff6b4f9cd0 (agent_radix_sort_downsweep.cuh:293)

Thread 1 "track_TrackSort" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 4, grid 2307, block (0,0,0), thread (32,0,0), device 0, sm 0, warp 0, lane 0]
0x00007fff6b4f9a50 in cub::CUB_200302_750_NS::AgentRadixSortDownsweep<cub::CUB_200302_750_NS::AgentRadixSortDownsweepPolicy<512, 23, unsigned int, (cub::CUB_200302_750_NS::BlockLoadAlgorithm)3, (cub::CUB_200302_750_NS::CacheLoadModifier)0, (cub::CUB_200302_750_NS::RadixRankAlgorithm)2, (cub::CUB_200302_750_NS::BlockScanAlgorithm)2, 7, cub::CUB_200302_750_NS::RegBoundScaling<512, 23, unsigned int> >, false, unsigned int, unsigned int, unsigned int, cub::CUB_200302_750_NS::detail::identity_decomposer_t>::ScatterKeys<false> (this=0x7fffd7fffb58, twiddled_keys=..., relative_bin_offsets=..., ranks=..., valid_items=6240)
    at /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/cub/agent/agent_radix_sort_downsweep.cuh:293
293	            temp_storage.keys_and_offsets.exchange_keys[ranks[ITEM]] = twiddled_keys[ITEM];
(cuda-gdb) p ranks
$2 = (@local int (& @local)[23]) <error reading variable: Error: read_local_memory(0, 0, 0, 0): failed to read local memory at address 0x7fffd7fff8f8 size 92, error=CUDBG_ERROR_INVALID_ADDRESS, error message=
>
(cuda-gdb) p twiddled_keys
$3 = (@local _ZN3cub17CUB_200302_750_NS23AgentRadixSortDownsweepINS0_29AgentRadixSortDownsweepPolicyILi512ELi23EjLNS0_18BlockLoadAlgorithmE3ELNS0_17CacheLoadModifierE0ELNS0_18RadixRankAlgorithmE2ELNS0_18BlockScanAlgorithmE2ELi7ENS0_15RegBoundScalingILi512ELi23EjEEEELb0EjjjNS0_6detail21identity_decomposer_tEE16bit_ordered_typeE (& @local)[23]) <error reading variable: Error: read_local_memory(0, 0, 0, 0): failed to read local memory at address 0x7fffd7fffaa4 size 92, error=CUDBG_ERROR_INVALID_ADDRESS, error message=

compute-sanitize reports

========= Invalid __global__ write of size 4 bytes
=========     at ScatterKeys<false>+0x323d0 in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/cub/agent/agent_radix_sort_downsweep.cuh:293
=========     by thread (32,0,0) in block (0,0,0)
=========     Address 0x7f2a612882f8 is out of bounds
=========     and is 2391310585 bytes after the nearest allocation at 0x7f29d2a00000 of size 512 bytes
=========     Device Frame:ProcessTile<false>+0x33710 in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/cub/agent/agent_radix_sort_downsweep.cuh:635
=========     Device Frame:ProcessRegion+0x363d0 in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/cub/agent/agent_radix_sort_downsweep.cuh:820
=========     Device Frame:void cub::CUB_200302_750_NS::DeviceRadixSortDownsweepKernel<cub::CUB_200302_750_NS::DeviceRadixSortPolicy<unsigned int, unsigned int, unsigned int>::Policy900, (bool)0, (bool)0, unsigned int, unsign
ed int, unsigned int, cub::CUB_200302_750_NS::detail::identity_decomposer_t>(const T4 *, T4 *, const T5 *, T5 *, T6 *, T6, int, int, cub::CUB_200302_750_NS::GridEvenShare<T6>, T7)+0x36420 in /cvmfs/projects.cern.ch/lcg/releas
es/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/cub/agent/agent_radix_sort_downsweep.cuh:339
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x2f167f]
=========                in /lib64/libcuda.so.1
=========     Host Frame: [0x15803]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/sw/lcg/releases/cuda/12.4.1-5bf34/x86_64-el9-gcc13-opt/lib64/libcudart.so.12
=========     Host Frame:cudaLaunchKernel [0x75230]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/sw/lcg/releases/cuda/12.4.1-5bf34/x86_64-el9-gcc13-opt/lib64/libcudart.so.12
=========     Host Frame:cudaError cudaLaunchKernel<char>(char const*, dim3, dim3, void**, unsigned long, CUstream_st*) in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/cuda_runtim
e.h:216 [0xd772fa]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:__device_stub__ZN3cub17CUB_200302_750_NS30DeviceRadixSortDownsweepKernelINS0_21DeviceRadixSortPolicyIjjjE9Policy900ELb0ELb0EjjjNS0_6detail21identity_decomposer_tEEEvPKT2_PS7_PKT3_PSB_PT4_SF_iiNS0_13Gr
idEvenShareISF_EET5_(unsigned int const*, unsigned int*, unsigned int const*, unsigned int*, unsigned int*, unsigned int, int, int, cub::CUB_200302_750_NS::GridEvenShare<unsigned int>&, cub::CUB_200302_750_NS::detail::identit
y_decomposer_t&) in /tmp/pcanal/tmpxft_00233a7e_00000000-6_TrackSortUtils.cudafe1.stub.c:55 [0xd75a4d]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:void cub::CUB_200302_750_NS::__wrapper__device_stub_DeviceRadixSortDownsweepKernel<cub::CUB_200302_750_NS::DeviceRadixSortPolicy<unsigned int, unsigned int, unsigned int>::Policy900, false, false, uns
igned int, unsigned int, unsigned int, cub::CUB_200302_750_NS::detail::identity_decomposer_t>(unsigned int const*&, unsigned int*&, unsigned int const*&, unsigned int*&, unsigned int*&, unsigned int&, int&, int&, cub::CUB_200
302_750_NS::GridEvenShare<unsigned int>&, cub::CUB_200302_750_NS::detail::identity_decomposer_t&) in /tmp/pcanal/tmpxft_00233a7e_00000000-6_TrackSortUtils.cudafe1.stub.c:58 [0xd75abe]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:void cub::CUB_200302_750_NS::DeviceRadixSortDownsweepKernel<cub::CUB_200302_750_NS::DeviceRadixSortPolicy<unsigned int, unsigned int, unsigned int>::Policy900, false, false, unsigned int, unsigned int
, unsigned int, cub::CUB_200302_750_NS::detail::identity_decomposer_t>(unsigned int const*, unsigned int*, unsigned int const*, unsigned int*, unsigned int*, unsigned int, int, int, cub::CUB_200302_750_NS::GridEvenShare<unsigned int>, cub::CUB_200302_750_NS::detail::identity_decomposer_t) in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/cub/device/dispatch/dispatch_radix_sort.cuh:303 [0xd792a8]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:cudaError thrust::THRUST_200302_750_NS::cuda_cub::launcher::triple_chevron::doit_host<void (*)(unsigned int const*, unsigned int*, unsigned int const*, unsigned int*, unsigned int*, unsigned int, int, int, cub::CUB_200302_750_NS::GridEvenShare<unsigned int>, cub::CUB_200302_750_NS::detail::identity_decomposer_t), unsigned int const*, unsigned int*, unsigned int const*, unsigned int*, unsigned int*, unsigned int, int, int, cub::CUB_200302_750_NS::GridEvenShare<unsigned int>, cub::CUB_200302_750_NS::detail::identity_decomposer_t>(void (*)(unsigned int const*, unsigned int*, unsigned int const*, unsigned int*, unsigned int*, unsigned int, int, int, cub::CUB_200302_750_NS::GridEvenShare<unsigned int>, cub::CUB_200302_750_NS::detail::identity_decomposer_t), unsigned int const* const&, unsigned int* const&, unsigned int const* const&, unsigned int* const&, unsigned int* const&, unsigned int const&, int const&, int const&, cub::CUB_200302_750_NS::GridEvenShare<unsigned int> const&, cub::CUB_200302_750_NS::detail::identity_decomposer_t const&) const in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/thrust/system/cuda/detail/core/triple_chevron_launch.h:70 [0xd8df37]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:cudaError cub::CUB_200302_750_NS::DeviceRadixSort::SortPairs<unsigned int, unsigned int, int>(void*, unsigned long&, cub::CUB_200302_750_NS::DoubleBuffer<unsigned int>&, cub::CUB_200302_750_NS::DoubleBuffer<unsigned int>&, int, int, int, CUstream_st*) in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/cub/device/device_radix_sort.cuh:806 [0xd82956]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:cudaError thrust::THRUST_200302_750_NS::cuda_cub::__radix_sort::dispatch<cuda::std::__4::integral_constant<bool, true>, thrust::THRUST_200302_750_NS::less<unsigned int> >::doit<unsigned int, unsigned int, long>(void*, unsigned long&, cub::CUB_200302_750_NS::DoubleBuffer<unsigned int>&, cub::CUB_200302_750_NS::DoubleBuffer<unsigned int>&, long, CUstream_st*) in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/thrust/system/cuda/detail/sort.h:279 [0xd7d01f]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:void thrust::THRUST_200302_750_NS::cuda_cub::__radix_sort::radix_sort<cuda::std::__4::integral_constant<bool, true>, thrust::THRUST_200302_750_NS::detail::execute_with_allocator<thrust::THRUST_200302_750_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200302_750_NS::cuda_cub::execute_on_stream_nosync_base>, unsigned int, unsigned int, long, thrust::THRUST_200302_750_NS::less<unsigned int> >(thrust::THRUST_200302_750_NS::cuda_cub::execution_policy<thrust::THRUST_200302_750_NS::detail::execute_with_allocator<thrust::THRUST_200302_750_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200302_750_NS::cuda_cub::execute_on_stream_nosync_base> >&, unsigned int*, unsigned int*, long, thrust::THRUST_200302_750_NS::less<unsigned int>) in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/thrust/system/cuda/detail/sort.h:367 [0xd7cb08]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:thrust::THRUST_200302_750_NS::cuda_cub::__smart_sort::enable_if_primitive_sort<unsigned int*, thrust::THRUST_200302_750_NS::less<unsigned int> >::type thrust::THRUST_200302_750_NS::cuda_cub::__smart_sort::smart_sort<cuda::std::__4::integral_constant<bool, true>, cuda::std::__4::integral_constant<bool, false>, thrust::THRUST_200302_750_NS::detail::execute_with_allocator<thrust::THRUST_200302_750_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200302_750_NS::cuda_cub::execute_on_stream_nosync_base>, unsigned int*, thrust::THRUST_200302_750_NS::device_ptr<unsigned int>, thrust::THRUST_200302_750_NS::less<unsigned int> >(thrust::THRUST_200302_750_NS::cuda_cub::execution_policy<thrust::THRUST_200302_750_NS::detail::execute_with_allocator<thrust::THRUST_200302_750_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200302_750_NS::cuda_cub::execute_on_stream_nosync_base> >&, unsigned int*, unsigned int*, thrust::THRUST_200302_750_NS::device_ptr<unsigned int>, thrust::THRUST_200302_750_NS::less<unsigned int>) in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/thrust/system/cuda/detail/sort.h:461 [0xd7c674]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:void thrust::THRUST_200302_750_NS::cuda_cub::sort_by_key<thrust::THRUST_200302_750_NS::detail::execute_with_allocator<thrust::THRUST_200302_750_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200302_750_NS::cuda_cub::execute_on_stream_nosync_base>, unsigned int*, thrust::THRUST_200302_750_NS::device_ptr<unsigned int>, thrust::THRUST_200302_750_NS::less<unsigned int> >(thrust::THRUST_200302_750_NS::cuda_cub::execution_policy<thrust::THRUST_200302_750_NS::detail::execute_with_allocator<thrust::THRUST_200302_750_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200302_750_NS::cuda_cub::execute_on_stream_nosync_base> >&, unsigned int*, unsigned int*, thrust::THRUST_200302_750_NS::device_ptr<unsigned int>, thrust::THRUST_200302_750_NS::less<unsigned int>) in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/thrust/system/cuda/detail/sort.h:551 [0xd7c166]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
[1:25](https://celeritasproject.slack.com/archives/DSUBD0FLM/p1729319158624659)
=========     Host Frame:void thrust::THRUST_200302_750_NS::cuda_cub::sort_by_key<thrust::THRUST_200302_750_NS::detail::execute_with_allocator<thrust::THRUST_200302_750_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200302_750_NS::cuda_cub::execute_on_stream_nosync_base>, unsigned int*, thrust::THRUST_200302_750_NS::device_ptr<unsigned int> >(thrust::THRUST_200302_750_NS::cuda_cub::execution_policy<thrust::THRUST_200302_750_NS::detail::execute_with_allocator<thrust::THRUST_200302_750_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200302_750_NS::cuda_cub::execute_on_stream_nosync_base> >&, unsigned int*, unsigned int*, thrust::THRUST_200302_750_NS::device_ptr<unsigned int>) in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/thrust/system/cuda/detail/sort.h:621 [0xd7b15c]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:void thrust::THRUST_200302_750_NS::sort_by_key<thrust::THRUST_200302_750_NS::detail::execute_with_allocator<thrust::THRUST_200302_750_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200302_750_NS::cuda_cub::execute_on_stream_nosync_base>, unsigned int*, thrust::THRUST_200302_750_NS::device_ptr<unsigned int> >(thrust::THRUST_200302_750_NS::detail::execution_policy_base<thrust::THRUST_200302_750_NS::detail::execute_with_allocator<thrust::THRUST_200302_750_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200302_750_NS::cuda_cub::execute_on_stream_nosync_base> > const&, unsigned int*, unsigned int*, thrust::THRUST_200302_750_NS::device_ptr<unsigned int>) in /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/thrust/detail/sort.inl:102 [0xd79535]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:void celeritas::detail::(anonymous namespace)::sort_impl<celeritas::OpaqueId<celeritas::ActionInterface, unsigned int>, unsigned int>(celeritas::Collection<unsigned int, (celeritas::Ownership)1, (celeritas::MemSpace)1, celeritas::OpaqueId<celeritas::Thread_, unsigned int> > const&, celeritas::ObserverPtr<celeritas::OpaqueId<celeritas::ActionInterface, unsigned int> const, (celeritas::MemSpace)1>, celeritas::OpaqueId<celer
itas::Stream_, unsigned int>) in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/celeritas/src/celeritas/track/detail/TrackSortUtils.cu:98 [0xd78310]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:celeritas::detail::sort_tracks(celeritas::CoreStateData<(celeritas::Ownership)1, (celeritas::MemSpace)1> const&, celeritas::TrackOrder) in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/celeritas/src/celeritas/track/detail/TrackSortUtils.cu:155 [0xd738e1]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:celeritas::SortTracksAction::step(celeritas::CoreParams const&, celeritas::CoreState<(celeritas::MemSpace)1>&) const in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/celeritas/src/celeritas/track/SortTracksAction.cc:121 [0xc2aa05]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:void celeritas::ActionSequence::step<(celeritas::MemSpace)1>(celeritas::CoreParams const&, celeritas::CoreState<(celeritas::MemSpace)1>&) in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/celeritas/src/celeritas/global/ActionSequence.cc:130 [0xae7110]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:celeritas::Stepper<(celeritas::MemSpace)1>::operator()() in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/celeritas/src/celeritas/global/Stepper.cc:130 [0xb21ef9]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:celeritas::Stepper<(celeritas::MemSpace)1>::operator()(celeritas::Span<celeritas::Primary const, 18446744073709551615ul>) in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/celeritas/src/celeritas/global/Stepper.cc:166 [0xb22272]
=========                in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/build-celeritas-cuda-debug/lib64/libceleritas.so
=========     Host Frame:celeritas::test::TestTrackSortActionIdEm3Stepper_device_is_sorted_Test::TestBody() in /afs/cern.ch/user/p/pcanal/atlas-on-eos/celeritas/celeritas/test/celeritas/track/TrackSort.test.cc:394 [0x43529]
@pcanal
Copy link
Contributor Author

pcanal commented Oct 22, 2024

So the problem is weird and “seem” to indicate violation of ODR (i.e. an inconsistent build). On the debugger when we get to the point the ‘bad’ function is about to be called:

    at /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/cub/agent/agent_radix_sort_downsweep.cuh:635
635             ScatterKeys<FULL_TILE>(keys, relative_bin_offsets, ranks, valid_items);

(and when we use up to get back to that frame), all the variable have correct values.

when we step into the ‘bad’ function and immediately get to:

    at /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/cub/agent/agent_radix_sort_downsweep.cuh:291
291             for (int ITEM = 0; ITEM < ITEMS_PER_THREAD; ++ITEM)

then everything is screwed

[1:49](https://celeritasproject.slack.com/archives/DSUBD0FLM/p1729536561770539)
(cuda-gdb) p &keys
$46 = (@local _ZN3cub17CUB_200302_750_NS23AgentRadixSortDownsweepINS0_29AgentRadixSortDownsweepPolicyILi256ELi47EjLNS0_18BlockLoadAlgorithmE3ELNS0_17CacheLoadModifierE0ELNS0_18RadixRankAlgorithmE1ELNS0_18BlockScanAlgorithmE2ELi6ENS0_15RegBoundScalingILi256ELi47EjEEEELb0EjjjNS0_6detail21identity_decomposer_tEE16bit_ordered_typeE (*)[47]) 0xfffa5c
(cuda-gdb) p keys
$47 = {18, 18, 18, 18, 18, 18, 18, 18, 4294967295 <repeats 39 times>}
(cuda-gdb) down
    at /cvmfs/projects.cern.ch/lcg/releases/cuda/12.4.1/x86_64-linux/targets/x86_64-linux/include/cub/agent/agent_radix_sort_downsweep.cuh:291
291             for (int ITEM = 0; ITEM < ITEMS_PER_THREAD; ++ITEM)
(cuda-gdb) p &twiddled_keys
$48 = (@local _ZN3cub17CUB_200302_750_NS23AgentRadixSortDownsweepINS0_29AgentRadixSortDownsweepPolicyILi256ELi47EjLNS0_18BlockLoadAlgorithmE3ELNS0_17CacheLoadModifierE0ELNS0_18RadixRankAlgorithmE1ELNS0_18BlockScanAlgorithmE2ELi6ENS0_15RegBoundScalingILi256ELi47EjEEEELb0EjjjNS0_6detail21identity_decomposer_tEE16bit_ordered_typeE (*)[47]) 0x2c06f02605ac33
(cuda-gdb) p twiddled_keys
$49 = (@local _ZN3cub17CUB_200302_750_NS23AgentRadixSortDownsweepINS0_29AgentRadixSortDownsweepPolicyILi256ELi47EjLNS0_18BlockLoadAlgorithmE3ELNS0_17CacheLoadModifierE0ELNS0_18RadixRankAlgorithmE1ELNS0_18BlockScanAlgorithmE2ELi6ENS0_15RegBoundScalingILi256ELi47EjEEEELb0EjjjNS0_6detail21identity_decomposer_tEE16bit_ordered_typeE (& @local)[47]) <error reading variable: Error: read_local_memory(0, 0, 0, 0): failed to read local memory at address 0x7fffd7fffa5c size 188, error=CUDBG_ERROR_INVALID_ADDRESS, error message=

@pcanal pcanal self-assigned this Oct 22, 2024
@esseivaju
Copy link
Contributor

I hit the same error on Perlmutter with the build AthSimulation,local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt,2024-10-18T1700

16:52:37 -------- EEEE ------- G4Exception-START -------- EEEE -------
16:52:37 *** G4Exception : celer0004
16:52:37       issued by : Thrust GPU library
16:52:37 parallel_for: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered

@pcanal
Copy link
Contributor Author

pcanal commented Oct 22, 2024

I am making a build not using standalone versions of gcc13 and cuda 12.4.1 to get another data point.

@esseivaju
Copy link
Contributor

esseivaju commented Oct 22, 2024

I only get the error running with Athena+Celeritas. I tried building Celeritas within the Athena environment and all the unit tests succeeded.

asetup AthSimulation,local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt,2024-10-18T1700
cmake -DVDT_ROOT=/cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/sw/lcg/releases/LCG_106_ATLAS_13/vdt/0.4.4/x86_64-el9-gcc13-opt/ -DCELERITAS_USE_Geant4=OFF -DCELERITAS_BUILD_TESTS=ON -DCELERITAS_USE_CUDA=ON -DCELERITAS_USE_ROOT=ON -DCMAKE_CUDA_ARCHITECTURES=80 -G Ninja -DCMAKE_CUDA_FLAGS="-g -G" -DCMAKE_EXPORT_COMPILE_COMMANDS=ON ../celeritas

EDIT: TrackSort test is disabled without Geant4

@sethrj
Copy link
Member

sethrj commented Oct 22, 2024

Have we tried a build without VecGeom just to rule out RDC weirdness?

@esseivaju
Copy link
Contributor

WIth Geant4, 29 tests fail. compute-sanitizer for the TrackSort test shows multiple errors.

Stack overflow
========= Stack overflow
=========     at celeritas::SimTrackView::operator =(const celeritas::SimTrackInitializer &)+0x50 in /pscratch/sd/e/esseivaj/celer-athena/celeritas/src/celeritas/track/SimTrackView.hh:148
=========     by thread (5,0,0) in block (0,0,0)
=========     Device Frame:celeritas::detail::InitTracksExecutor::operator ()(celeritas::OpaqueId<celeritas::Thread_, unsigned int>) const+0x28c0 in /pscratch/sd/e/esseivaj/celer-athena/celeritas/src/celeritas/track/detail/InitTracksExecutor.hh:111
=========     Device Frame:launch_kernel_impl<InitTracksExecutor>+0x2950 in /pscratch/sd/e/esseivaj/celer-athena/celeritas/src/corecel/sys/detail/KernelLauncherImpl.device.hh:36
=========     Device Frame:void celeritas::detail::<unnamed>::launch_action_impl<celeritas::detail::InitTracksExecutor, (bool)1>(celeritas::Range<celeritas::OpaqueId<celeritas::Thread_, unsigned int>>, T1)+0x2970 in /pscratch/sd/e/esseivaj/celer-athena/celeritas/src/c
orecel/sys/detail/KernelLauncherImpl.device.hh:48
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x33255f]
=========                in /.singularity.d/libs/libcuda.so.1
=========     Host Frame: [0x15803]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/sw/lcg/releases/cuda/12.4.1-5bf34/x86_64-el9-gcc13-opt/lib64/libcudart.so.12
=========     Host Frame:cudaLaunchKernel [0x75230]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/sw/lcg/releases/cuda/12.4.1-5bf34/x86_64-el9-gcc13-opt/lib64/libcudart.so.12
=========     Host Frame:void celeritas::detail::(anonymous namespace)::launch_action_impl<celeritas::detail::InitTracksExecutor, true>(celeritas::Range<celeritas::OpaqueId<celeritas::Thread_, unsigned int> >, celeritas::detail::InitTracksExecutor) in /pscratch/sd/e/esseivaj/celer-athena/celeritas/src/corecel/sys/detail/KernelLauncherImpl.device.hh:47 [0x54e0a2]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:celeritas::InitializeTracksAction::step_impl(celeritas::CoreParams const&, celeritas::CoreState<(celeritas::MemSpace)1>&, unsigned int) const in /pscratch/sd/e/esseivaj/celer-athena/celeritas/src/celeritas/track/InitializeTracksAction.cu:33 [0
x54dff7]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:celeritas::InitializeTracksAction::step(celeritas::CoreParams const&, celeritas::CoreState<(celeritas::MemSpace)1>&) const [0x54cae2]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:void celeritas::ActionSequence::step<(celeritas::MemSpace)1>(celeritas::CoreParams const&, celeritas::CoreState<(celeritas::MemSpace)1>&) [0x3cc071]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:celeritas::Stepper<(celeritas::MemSpace)1>::operator()() [0x3e2ee9]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:celeritas::test::TestTrackPartitionEm3Stepper_device_is_partitioned_Test::TestBody() [0x21ff9]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/test/celeritas/track_TrackSort
=========     Host Frame:void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/g
test.cc:2657 [0x55fbe]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::Test::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2686 [0x4607d]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::TestInfo::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2845 [0x46234]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::TestSuite::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2977 [0x46366]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::internal::UnitTestImpl::RunAllTests() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:5890 [0x4c0eb]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) in /data/bmorgan/athena-buildroot/1810-24-gpu/build/
AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2657 [0x5643e]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::UnitTest::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:5455 [0x464d5]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:celeritas::testdetail::test_main(int, char**) [0x43805]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/test/libtestcel_harness.so
=========     Host Frame:__libc_start_call_main [0x2958f]
=========                in /lib64/libc.so.6
=========     Host Frame:__libc_start_main [0x2963f]
=========                in /lib64/libc.so.6
=========     Host Frame:_start [0x1ebc4]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/test/celeritas/track_TrackSort
launch failure
========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaStreamSynchronize.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame: [0x481c35]
=========                in /.singularity.d/libs/libcuda.so.1
=========     Host Frame:cudaStreamSynchronize [0x74ffa]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/sw/lcg/releases/cuda/12.4.1-5bf34/x86_64-el9-gcc13-opt/lib64/libcudart.so.12
=========     Host Frame:celeritas::StatusChecker::launch_impl(celeritas::CoreParams const&, celeritas::CoreState<(celeritas::MemSpace)1>&, celeritas::StatusCheckStateData<(celeritas::Ownership)1, (celeritas::MemSpace)1> const&) const in /pscratch/sd/e/esseivaj/celer-athena/celeritas/src/celeritas/track/StatusChecker.cu:39 [0x55158d]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:void celeritas::StatusChecker::step<(celeritas::MemSpace)1>(celeritas::OpaqueId<celeritas::ActionInterface, unsigned int>, celeritas::CoreParams const&, celeritas::CoreState<(celeritas::MemSpace)1>&) const [0x5502fd]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:void celeritas::ActionSequence::step<(celeritas::MemSpace)1>(celeritas::CoreParams const&, celeritas::CoreState<(celeritas::MemSpace)1>&) [0x3cc09f]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:celeritas::Stepper<(celeritas::MemSpace)1>::operator()() [0x3e2ee9]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:celeritas::test::TestTrackPartitionEm3Stepper_device_is_partitioned_Test::TestBody() [0x21ff9]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/test/celeritas/track_TrackSort
=========     Host Frame:void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2657 [0x55fbe]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::Test::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2686 [0x4607d]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::TestInfo::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2845 [0x46234]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::TestSuite::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2977 [0x46366]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::internal::UnitTestImpl::RunAllTests() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:5890 [0x4c0eb]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) in /data/bmorgan/athena-buildroot/1810-24-gpu/build/
AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2657 [0x5643e]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::UnitTest::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:5455 [0x464d5]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:celeritas::testdetail::test_main(int, char**) [0x43805]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/test/libtestcel_harness.so
=========     Host Frame:__libc_start_call_main [0x2958f]
=========                in /lib64/libc.so.6
=========     Host Frame:__libc_start_main [0x2963f]
=========                in /lib64/libc.so.6
=========     Host Frame:_start [0x1ebc4]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/test/celeritas/track_TrackSort
launch failure cudaFree
========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaFreeHost.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame: [0x481c35]
=========                in /.singularity.d/libs/libcuda.so.1
=========     Host Frame:cudaFreeHost [0x5780d]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/sw/lcg/releases/cuda/12.4.1-5bf34/x86_64-el9-gcc13-opt/lib64/libcudart.so.12
=========     Host Frame:celeritas::PinnedAllocator<celeritas::OpaqueId<celeritas::Thread_, unsigned int> >::deallocate(celeritas::OpaqueId<celeritas::Thread_, unsigned int>*, unsigned long) [0x3e6922]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:celeritas::CoreState<(celeritas::MemSpace)1>::~CoreState() [0x3d2f82]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:celeritas::Stepper<(celeritas::MemSpace)1>::~Stepper() [0x3e35c1]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/lib64/libceleritas.so
=========     Host Frame:celeritas::test::TestTrackPartitionEm3Stepper_device_is_partitioned_Test::TestBody() [clone .cold] [0x1de88]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/test/celeritas/track_TrackSort
=========     Host Frame:void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2657 [0x55fbe]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::Test::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2686 [0x4607d]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::TestInfo::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2845 [0x46234]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::TestSuite::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2977 [0x46366]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::internal::UnitTestImpl::RunAllTests() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:5890 [0x4c0eb]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) in /data/bmorgan/athena-buildroot/1810-24-gpu/build/
AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:2657 [0x5643e]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::UnitTest::Run() in /data/bmorgan/athena-buildroot/1810-24-gpu/build/AthSimulationExternals/src/GoogleTest/googletest/src/gtest.cc:5455 [0x464d5]
=========                in /cvmfs/atlas-nightlies.cern.ch/repo/sw/local/simulation/main_AthSimulation_x86_64-el9-gcc13-opt/2024-10-18T1700/AthSimulationExternals/25.0.20/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:celeritas::testdetail::test_main(int, char**) [0x43805]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/test/libtestcel_harness.so
=========     Host Frame:__libc_start_call_main [0x2958f]
=========                in /lib64/libc.so.6
=========     Host Frame:__libc_start_main [0x2963f]
=========                in /lib64/libc.so.6
=========     Host Frame:_start [0x1ebc4]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celer-build/test/celeritas/track_TrackSort

@sethrj
Copy link
Member

sethrj commented Oct 22, 2024

@esseivaju youre sure you're building with the right cuda arch?

@esseivaju
Copy link
Contributor

yes, building with CUDA_ARCH=80. This should also be included in the cvmfs build (@drbenmorgan ). I am using cuda 12.6 to compile Celeritas, the cvmfs build was 12.4 I think but it shouldn't matter since this is an independent build, asetup is only used for picking up Celeritas dependencies.

@esseivaju
Copy link
Contributor

esseivaju commented Oct 23, 2024

I rebuilt AthSimulation + AthSimulationExternals on Perlmutter, but that made no difference. Building Celeritas within the AthSimulationExternal environment leads to the same failures in Athena and the Celeritas unit tests. I also see invalid read as reported by Philippe

invalid memory access
========= Invalid __global__ read of size 4 bytes
=========     at _adaptorINS0_10device_ptrIjEEPjjNS0_8cuda_cub3tagENS0_27random_access_traversal_tagENS0_16device_referenceIjEElE7advanceEl+0x4300 in /usr/local/cuda/include/cub/thread/thread_load.cuh:289
=========     by thread (64,0,0) in block (2,0,0)
=========     Address 0x16e0006904 is out of bounds
=========     and is 6 bytes after the nearest allocation at 0x16e0006400 of size 1279 bytes
=========     Device Frame:_core_access7advanceINS0_10device_ptrIjEEEEvRT_NS5_15difference_typeE+0x4300 in /usr/local/cuda/include/cub/thread/thread_load.cuh:354
=========     Device Frame:thrust20THRUST_200500_800_NS15iterator_facadeINS0_10device_ptrIjEEjNS0_8cuda_cub3tagENS0_27random_access_traversal_tagENS0_16device_referenceIjEElEpLEl+0x4300 in /usr/local/cuda/include/cub/iterator/cache_modified_input_iterator.cuh:217
=========     Device Frame:27random_access_traversal_tagENS0_16device_referenceIjEElEET_RKNS0_15iterator_facadeIS9_T0_T1_T2_T3_T4_EENS9_15difference_typeE+0x42c0 in /usr/local/cuda/include/cub/block/block_load.cuh:86
=========     Device Frame:erator_facadeINS0_10device_ptrIjEEjNS0_8cuda_cub3tagENS0_27random_access_traversal_tagENS0_16device_referenceIjEElEixEl+0x42c0 in /usr/local/cuda/include/cub/block/block_load.cuh:854
=========     Device Frame:_10device_ptrIjEEjEEE12consume_tileILb1EEEvii+0x42c0 in /usr/local/cuda/include/cub/block/block_load.cuh:1484
=========     Device Frame:_+0x42c0 in /usr/local/cuda/include/cub/agent/agent_select_if.cuh:760
=========     Device Frame:EvRNS1_12cross_systemIT_T0_EET1_T2_+0x4290 in /usr/local/cuda/include/cub/agent/agent_select_if.cuh:898
=========     Device Frame:_cub3tagENS0_6system3cpp6detail3tagEPKdEEvPT_PT0_T1_+0x1540 in /usr/local/cuda/include/cub/agent/agent_select_if.cuh:934
=========     Device Frame:void cub::CUB_200500_800_NS::DeviceSelectSweepKernel<cub::CUB_200500_800_NS::detail::device_select_policy_hub<unsigned int, cub::CUB_200500_800_NS::NullType, int, (bool)0, (bool)1>::Policy900, unsigned int *, cub::CUB_200500_800_NS::NullType *, thrust::THRUST_200500_800_NS::device_ptr<unsigned int>, int *, cub::CUB_200500_800_NS::ScanTileState<int, (bool)1>, celeritas::detail::IsNotInactive, cub::CUB_200500_800_NS::NullType, int, (bool)1>(T2, T3, T4, T5, T6, T7, T8, T9, int, cub::CUB_200500_800_NS::detail::vsmem_t)+0x10 in /usr/local/cuda/include/cub/device/dispatch/dispatch_select_if.cuh:230
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x33255f]
=========                in /.singularity.d/libs/libcuda.so.1
=========     Host Frame: [0x15aa7]
=========                in /usr/local/cuda/lib64/libcudart.so.12
=========     Host Frame:cudaLaunchKernel [0x759f0]
=========                in /usr/local/cuda/lib64/libcudart.so.12
=========     Host Frame:void cub::CUB_200500_800_NS::DeviceSelectSweepKernel<cub::CUB_200500_800_NS::detail::device_select_policy_hub<unsigned int, cub::CUB_200500_800_NS::NullType, int, false, true>::Policy900, unsigned int*, cub::CUB_200500_800_NS::NullType*, thrust::THRUST_200500_800_NS::device_ptr<unsigned int>, int*, cub::CUB_200500_800_NS::ScanTileState<int, true>, celeritas::detail::IsNotInactive, cub::CUB_200500_800_NS::NullType, int, true>(unsigned int*, cub::CUB_200500_800_NS::NullType*, thrust::THRUST_200500_800_NS::device_ptr<unsigned int>, int*, cub::CUB_200500_800_NS::ScanTileState<int, true>, celeritas::detail::IsNotInactive, cub::CUB_200500_800_NS::NullType, int, int, cub::CUB_200500_800_NS::detail::vsmem_t) [0x59f53d]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/lib64/libceleritas.so
=========     Host Frame:cudaError thrust::THRUST_200500_800_NS::cuda_cub::launcher::triple_chevron::doit_host<void (*)(unsigned int*, cub::CUB_200500_800_NS::NullType*, thrust::THRUST_200500_800_NS::device_ptr<unsigned int>, int*, cub::CUB_200500_800_NS::ScanTileState<int, true>, celeritas::detail::IsNotInactive, cub::CUB_200500_800_NS::NullType, int, int, cub::CUB_200500_800_NS::detail::vsmem_t), unsigned int*, cub::CUB_200500_800_NS::NullType*, thrust::THRUST_200500_800_NS::device_ptr<unsigned int>, int*, cub::CUB_200500_800_NS::ScanTileState<int, true>, celeritas::detail::IsNotInactive, cub::CUB_200500_800_NS::NullType, int, int, cub::CUB_200500_800_NS::detail::vsmem_t>(void (*)(unsigned int*, cub::CUB_200500_800_NS::NullType*, thrust::THRUST_200500_800_NS::device_ptr<unsigned int>, int*, cub::CUB_200500_800_NS::ScanTileState<int, true>, celeritas::detail::IsNotInactive, cub::CUB_200500_800_NS::NullType, int, int, cub::CUB_200500_800_NS::detail::vsmem_t), unsigned int* const&, cub::CUB_200500_800_NS::NullType* const&, thrust::THRUST_200500_800_NS::device_ptr<unsigned int> const&, int* const&, cub::CUB_200500_800_NS::ScanTileState<int, true> const&, celeritas::detail::IsNotInactive const&, cub::CUB_200500_800_NS::NullType const&, int const&, int const&, cub::CUB_200500_800_NS::detail::vsmem_t const&) const [clone .isra.0] [0x5a0643]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/lib64/libceleritas.so
=========     Host Frame:thrust::THRUST_200500_800_NS::cuda_cub::detail::DispatchPartitionIf<thrust::THRUST_200500_800_NS::detail::execute_with_allocator<thrust::THRUST_200500_800_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200500_800_NS::cuda_cub::execute_on_stream_nosync_base>, unsigned int*, cub::CUB_200500_800_NS::NullType*, thrust::THRUST_200500_800_NS::device_ptr<unsigned int>, celeritas::detail::IsNotInactive, int>::dispatch(thrust::THRUST_200500_800_NS::cuda_cub::execution_policy<thrust::THRUST_200500_800_NS::detail::execute_with_allocator<thrust::THRUST_200500_800_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200500_800_NS::cuda_cub::execute_on_stream_nosync_base> >&, void*, unsigned long&, unsigned int*, cub::CUB_200500_800_NS::NullType*, thrust::THRUST_200500_800_NS::device_ptr<unsigned int>, celeritas::detail::IsNotInactive, int, unsigned long&) [0x5a27cc]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/lib64/libceleritas.so
=========     Host Frame:unsigned long thrust::THRUST_200500_800_NS::cuda_cub::detail::partition<thrust::THRUST_200500_800_NS::detail::execute_with_allocator<thrust::THRUST_200500_800_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200500_800_NS::cuda_cub::execute_on_stream_nosync_base>, unsigned int*, cub::CUB_200500_800_NS::NullType*, thrust::THRUST_200500_800_NS::device_ptr<unsigned int>, celeritas::detail::IsNotInactive>(thrust::THRUST_200500_800_NS::cuda_cub::execution_policy<thrust::THRUST_200500_800_NS::detail::execute_with_allocator<thrust::THRUST_200500_800_NS::mr::allocator<char, celeritas::AsyncMemoryResource<void*> >, thrust::THRUST_200500_800_NS::cuda_cub::execute_on_stream_nosync_base> >&, unsigned int*, unsigned int*, cub::CUB_200500_800_NS::NullType*, thrust::THRUST_200500_800_NS::device_ptr<unsigned int>, celeritas::detail::IsNotInactive) [0x5a93f3]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/lib64/libceleritas.so
=========     Host Frame:void celeritas::detail::(anonymous namespace)::partition_impl<celeritas::detail::IsNotInactive>(celeritas::Collection<unsigned int, (celeritas::Ownership)1, (celeritas::MemSpace)1, celeritas::OpaqueId<celeritas::Thread_, unsigned int> > const&, celeritas::detail::IsNotInactive&&, celeritas::OpaqueId<celeritas::Stream_, unsigned int>) [0x5a96db]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/lib64/libceleritas.so
=========     Host Frame:celeritas::detail::sort_tracks(celeritas::CoreStateData<(celeritas::Ownership)1, (celeritas::MemSpace)1> const&, celeritas::TrackOrder) [0x59f3be]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/lib64/libceleritas.so
=========     Host Frame:celeritas::SortTracksAction::step(celeritas::CoreParams const&, celeritas::CoreState<(celeritas::MemSpace)1>&) const [0x4cf49b]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/lib64/libceleritas.so
=========     Host Frame:void celeritas::ActionSequence::step<(celeritas::MemSpace)1>(celeritas::CoreParams const&, celeritas::CoreState<(celeritas::MemSpace)1>&) [0x4247a1]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/lib64/libceleritas.so
=========     Host Frame:celeritas::Stepper<(celeritas::MemSpace)1>::operator()() [0x43be89]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/lib64/libceleritas.so
=========     Host Frame:celeritas::test::TestTrackPartitionEm3Stepper_device_is_partitioned_Test::TestBody() [0x221d9]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/test/celeritas/track_TrackSort
=========     Host Frame:void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) in /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/GoogleTest/googletest/src/gtest.cc:2657 [0x55fbe]
=========                in /pscratch/sd/e/esseivaj/celer-athena/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::Test::Run() in /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/GoogleTest/googletest/src/gtest.cc:2686 [0x4607d]
=========                in /pscratch/sd/e/esseivaj/celer-athena/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::TestInfo::Run() in /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/GoogleTest/googletest/src/gtest.cc:2845 [0x46234]
=========                in /pscratch/sd/e/esseivaj/celer-athena/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::TestSuite::Run() in /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/GoogleTest/googletest/src/gtest.cc:2977 [0x46366]
=========                in /pscratch/sd/e/esseivaj/celer-athena/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::internal::UnitTestImpl::RunAllTests() in /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/GoogleTest/googletest/src/gtest.cc:5890 [0x4c0eb]
=========                in /pscratch/sd/e/esseivaj/celer-athena/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) in /pscratch/sd/e/esseivaj/celer-athena/externals_bu
ild/src/GoogleTest/googletest/src/gtest.cc:2657 [0x5643e]
=========                in /pscratch/sd/e/esseivaj/celer-athena/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:testing::UnitTest::Run() in /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/GoogleTest/googletest/src/gtest.cc:5455 [0x464d5]
=========                in /pscratch/sd/e/esseivaj/celer-athena/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/lib/libgtest.so.1.13.0
=========     Host Frame:celeritas::testdetail::test_main(int, char**) [0x44d25]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/test/libtestcel_harness.so
=========     Host Frame:__libc_start_call_main [0x2958f]
=========                in /lib64/libc.so.6
=========     Host Frame:__libc_start_main [0x2963f]
=========                in /lib64/libc.so.6
=========     Host Frame:_start [0x1ec34]
=========                in /pscratch/sd/e/esseivaj/celer-athena/celeritas/build-ndebug/test/celeritas/track_TrackSort

If I compile with nvcc -g -G ... I hit the stack overflow and launch failure. Without the -g -G, I hit the global read error.

@pcanal
Copy link
Contributor Author

pcanal commented Oct 23, 2024

My build ended up with the following failures:

         58 - geocel/vg/Vecgeom:CmseTest.* (Failed)
        176 - celeritas/geo/Geometry (Failed)
        250 - celeritas/track/TrackSort (Failed)

(updated to remove test that failed due to an unrelated local mis-configuration resulting in not finding data files)
but the TrackSort failure is different

[ RUN      ] TestTrackSortActionIdEm3Stepper.device_is_sorted
unknown file: Failure
C++ exception with description "radix_sort: failed on 2nd step: cudaErrorInvalidValue: invalid argument" thrown in the test body.
Writing diagnostic output because test failed

@sethrj sethrj mentioned this issue Oct 23, 2024
13 tasks
@amandalund
Copy link
Contributor

I've been seeing the same TrackSort failure for a while on A100 with cuda 11.6.2:

[ RUN      ] TestTrackSortActionIdEm3Stepper.device_is_sorted
status: Celeritas core setup complete
info: Executing actions with additional debug checking
status: Celeritas core state initialization complete
warning: Cuda API error detected: cudaLaunchKernel returned (0x1)

warning: Cuda API error detected: cudaPeekAtLastError returned (0x1)

warning: Cuda API error detected: cudaGetLastError returned (0x1)

unknown file: Failure
C++ exception with description "radix_sort: failed on 2nd step: cudaErrorInvalidValue: invalid argument" thrown in the test body.
Writing diagnostic output because test failed

@sethrj
Copy link
Member

sethrj commented Oct 23, 2024

Digging into all this a little.... NVIDIA/cub#545 ?

@pcanal
Copy link
Contributor Author

pcanal commented Oct 23, 2024

It does seems to be exactly that. In the build used for #1462 (comment), VecGeom was built with CUDA architecture 70 while Celeritas was built with 80. Rebuilding VecGeom with 80 makes celeritas/track/TrackSort work correctly (Celeritas build without VecGeom also works).

@pcanal
Copy link
Contributor Author

pcanal commented Oct 23, 2024

Using multiple (consistent) architecture: CMAKE_CUDA_ARCHITECTURES="70;80" also works (confirmed), so I am not yet sure why it fails on lxplus.

@sethrj
Copy link
Member

sethrj commented Oct 23, 2024

So back to the drawing board? 😞

@pcanal
Copy link
Contributor Author

pcanal commented Oct 23, 2024

@esseivaju figured out that the lxplus libraries were not build as they were meant to be and were indeed using the wrong cuda architecture.

@sethrj
Copy link
Member

sethrj commented Oct 23, 2024

Y'all are the best!!

@esseivaju
Copy link
Contributor

esseivaju commented Oct 24, 2024

Celeritas unit test no longer crash because of gpu-related issues, however athsimulation still does. This is the output of running athena.py through compute-sanitizer athena-compute-sanitizer.log. Note that AthSimulation was built with LTO disabled (-DATLAS_GEANT4_USE_LTO=OFF).

The compute-sanitizer output doesn't show a thrust kernel failing even though the exception reported by Celeritas suggests that it is:

EVNTtoHITS 01:03:57 GPUOffload           INFO GPUOffload: End of Event
EVNTtoHITS 01:03:57 LocalTransporter.cc:256: info: Transporting 994 tracks from event 1 with Celeritas
EVNTtoHITS 01:03:58
EVNTtoHITS 01:03:58 -------- EEEE ------- G4Exception-START -------- EEEE -------
EVNTtoHITS 01:03:58 *** G4Exception : celer0004
EVNTtoHITS 01:03:58       issued by : Thrust GPU library
EVNTtoHITS 01:03:58 parallel_for: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
EVNTtoHITS 01:03:58 *** Fatal Exception *** core dump ***
EVNTtoHITS 01:03:58 G4Track (0x9dbdcd10) - track ID = 2, parent ID = 1
EVNTtoHITS 01:03:58  Particle type : e- - creator process : hIoni, creator model : model_DeltaElectron
EVNTtoHITS 01:03:58  Kinetic energy : 176.852 keV - Momentum direction : (1.04445e-314,-0.861732,-0.255076)
EVNTtoHITS 01:03:58  Step length : 0 fm  - total energy deposit : 0 eV
EVNTtoHITS 01:03:58  Pre-step point : (-473.08,-154.837,396.919) - Physical volume : Atlas::Atlas (Air)
EVNTtoHITS 01:03:58  - defined by : not available
EVNTtoHITS 01:03:58  Post-step point : (-473.08,-154.837,396.919) - Physical volume : Atlas::Atlas (Air)
EVNTtoHITS 01:03:58  - defined by : not available
EVNTtoHITS 01:03:58  *** Note: Step information might not be properly updated.
EVNTtoHITS 01:03:58
EVNTtoHITS 01:03:58 -------- EEEE -------- G4Exception-END --------- EEEE -------

As mentioned on Slack, I don't think there is code generated for sm_52 architectures as the command below shows no match.

cuobjdump output
esseivaj@perlmutter:login01:/pscratch/sd/e/esseivaj/celer-athena> rg --include-zero -c -e 'sm_52' --pre cuobjdump -g '*.so' -g '*.a' athsim_build externals_build 2>/dev/null
externals_build/External/VecGeom/CMakeFiles/VecGeomBuild/lib/libvecgeomcuda.so:0
externals_build/src/Celeritas-build/lib/liborange.a:0
externals_build/src/Celeritas-build/lib/libgeocel.a:0
externals_build/External/VecGeom/CMakeFiles/VecGeomBuild/lib/libvecgeomcuda_static.a:0
externals_build/src/Celeritas-build/lib/libcorecel.a:0
externals_build/src/Celeritas-build/lib/libcorecel_final.a:0
externals_build/src/Celeritas-build/lib/libgeocel_final.a:0
externals_build/src/Celeritas-build/lib/libceleritas_final.a:0
externals_build/External/Celeritas/CMakeFiles/CeleritasBuild/lib/liborange.a:0
externals_build/External/Celeritas/CMakeFiles/CeleritasBuild/lib/libgeocel.a:0
externals_build/External/Celeritas/CMakeFiles/CeleritasBuild/lib/libcorecel.a:0
externals_build/External/Celeritas/CMakeFiles/CeleritasBuild/lib/libcorecel_final.a:0
externals_build/External/Celeritas/CMakeFiles/CeleritasBuild/lib/libgeocel_final.a:0
externals_build/External/Celeritas/CMakeFiles/CeleritasBuild/lib/libceleritas_final.a:0
externals_build/src/Celeritas-build/lib/libceleritas.a:0
externals_build/src/Celeritas-build/lib/liborange_final.a:0
externals_build/src/Celeritas-build/lib/libaccel.a:0
externals_build/src/Celeritas-build/lib/libaccel_final.a:0
externals_build/src/VecGeom-build/libvecgeomcuda.so:0
externals_build/External/Celeritas/CMakeFiles/CeleritasBuild/lib/libaccel.a:0
externals_build/External/Celeritas/CMakeFiles/CeleritasBuild/lib/liborange_final.a:0
externals_build/External/Celeritas/CMakeFiles/CeleritasBuild/lib/libaccel_final.a:0
externals_build/External/Celeritas/CMakeFiles/CeleritasBuild/lib/libceleritas.a:0
externals_build/src/VecGeom-build/libvecgeomcuda_static.a:0
athsim_build/x86_64-el9-gcc13-opt/lib/libAtlasGeant4Lib.so:0
athsim_build/x86_64-el9-gcc13-opt/lib/libAtlasGeant4.so:0
externals_build/x86_64-el9-gcc13-opt/lib/libvecgeomcuda.so:0
externals_build/x86_64-el9-gcc13-opt/lib/libaccel.a:0
externals_build/x86_64-el9-gcc13-opt/lib/libaccel_final.a:0
externals_build/x86_64-el9-gcc13-opt/lib/liborange_final.a:0
externals_build/x86_64-el9-gcc13-opt/lib/libceleritas_final.a:0
externals_build/x86_64-el9-gcc13-opt/lib/libvecgeomcuda_static.a:0
externals_build/x86_64-el9-gcc13-opt/lib/libcorecel.a:0
externals_build/x86_64-el9-gcc13-opt/lib/libgeocel_final.a:0
externals_build/x86_64-el9-gcc13-opt/lib/libcorecel_final.a:0
externals_build/x86_64-el9-gcc13-opt/lib/libgeocel.a:0
externals_build/x86_64-el9-gcc13-opt/lib/libceleritas.a:0
externals_build/x86_64-el9-gcc13-opt/lib/liborange.a:0

@sethrj
Copy link
Member

sethrj commented Oct 24, 2024

@esseivaju @pcanal @drbenmorgan Have we tried building with -DCELERITAS_DEVICE_DEBUG=ON?

Also I remembered with #1433 we're actually not defaulting to using the same settings for GPU/CPU: perhaps we should use setup.track_order = TrackOrder::init_charge for the CPU case and see if we reproduce the failure?

@amandalund
Copy link
Contributor

I've also sometimes seen errors like that (CUDA error: an illegal memory access was encountered) when running on more complex geometries without increasing the CUDA heap/stack size enough.

@sethrj
Copy link
Member

sethrj commented Oct 24, 2024

That's a good point, we ought to try running this geometry in standalone to see.

@sethrj sethrj changed the title Apparent incompatibility with current ATLAS GPU environement Apparent incompatibility with current ATLAS GPU environment Oct 24, 2024
@esseivaju
Copy link
Contributor

That the GDML you dump during the hackathon right? We should be able to quickly test that

@sethrj
Copy link
Member

sethrj commented Oct 24, 2024

Yeah I've uploaded it and just ran on wildstyle with the regression harness; seems to work fine 👀

@esseivaju
Copy link
Contributor

esseivaju commented Oct 25, 2024

I tried turning on -DCELERITAS_DEVICE_DEBUG=ON or setting setup.track_order = TrackOrder::none but it doesn't make a difference. compute-sanitizer still reports the same global read error and weird exceptions such as:

EVNTtoHITS 23:56:14 -------- EEEE ------- G4Exception-START -------- EEEE -------
EVNTtoHITS 23:56:14 *** G4Exception : celer0004
EVNTtoHITS 23:56:14       issued by : /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas/src/corecel/sys/KernelAttributes.hh:75
EVNTtoHITS 23:56:14 Celeritas CUDA error: an illegal memory access was encountered
*** G4Exception : celer0004
      issued by : /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas/src/corecel/sys/KernelAttributes.hh:75
Celeritas CUDA error: unspecified launch failure

I also tried the regression problem, and they work fine but they complete much faster than the old tilecal with 2 modules we were using before.

Something must be still wrong with the build with Athena...

@sethrj
Copy link
Member

sethrj commented Oct 25, 2024

Damn. Thanks Julien, this is a mystery indeed...

@esseivaju
Copy link
Contributor

esseivaju commented Oct 25, 2024

No luck with increasing the stack and heap size. I ran athena through cuda-gdb and that's the output

CUDA error:

CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x963ab9b0  __nv_static_53__016fd2f1_25_InitializeTracksAction_cu_88c0b313_98394__ZN9celeritas6detail64_GLOBAL__N__016fd2f1_25_InitializeTracksAction_cu_88c0b313_9839418launch_action_implINS0_18InitTracksExecutorELb1EEEvNS_5RangeINS_8OpaqueIdINS_7Thread_EjEEEET_

Thread 1 "athena.py" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 22, grid 185102, block (0,0,0), thread (224,0,0), device 0, sm 0, warp 7, lane 0]
0x00000000963ab9c0 in __nv_static_53__016fd2f1_25_InitializeTracksAction_cu_88c0b313_98394__ZN9celeritas6detail64_GLOBAL__N__016fd2f1_25_InitializeTracksAction_cu_88c0b313_9839418launch_action_implINS0_18InitTracksExecutorELb1EEEvNS_5RangeINS_8OpaqueIdINS_7Thread_EjEEEET_<<<(4,1,1),(256,1,1)>>> ()

Host stack trace and exception: (it fails on the first CUDA runtime API call after the failing kernel so it's not very important)

EVNTtoHITS 22:57:50 -------- EEEE ------- G4Exception-START -------- EEEE -------
EVNTtoHITS 22:57:50 *** G4Exception : celer0004
EVNTtoHITS 22:57:50       issued by : Thrust GPU library
EVNTtoHITS 22:57:50 parallel_for: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
#9  0x00007fffceda2f45 in libcudart_static_141dba5462e92d2cffd1abc474df476c510a3a8c () from /pscratch/sd/e/esseivaj/celer-athena/athsim_build/x86_64-el9-gcc13-opt/lib/libAtlasGeant4.so
#10 0x00007fffcee07ddd in cudaStreamSynchronize () from /pscratch/sd/e/esseivaj/celer-athena/athsim_build/x86_64-el9-gcc13-opt/lib/libAtlasGeant4.so
#11 0x00007fffced9be19 in cub::CUB_200500_800_NS::SyncStream (stream=0x1) at /usr/local/cuda/include/cub/util_device.cuh:503
#12 thrust::THRUST_200500_800_NS::cuda_cub::synchronize_stream<thrust::THRUST_200500_800_NS::cuda_cub::par_t> (policy=...) at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:100
#13 thrust::THRUST_200500_800_NS::cuda_cub::synchronize_stream_optional<thrust::THRUST_200500_800_NS::cuda_cub::par_t> (policy=...) at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:119
#14 thrust::THRUST_200500_800_NS::cuda_cub::synchronize_optional<thrust::THRUST_200500_800_NS::cuda_cub::execution_policy<thrust::THRUST_200500_800_NS::cuda_cub::par_t> > (policy=...) at /usr/local/cuda/include/thrust/system/cuda/detail/util.h:133
#15 thrust::THRUST_200500_800_NS::cuda_cub::parallel_for<thrust::THRUST_200500_800_NS::cuda_cub::par_t, thrust::THRUST_200500_800_NS::cuda_cub::__fill::functor<thrust::THRUST_200500_800_NS::device_ptr<celeritas::OpaqueId<celeritas::TrackSlot_, unsigned int> >, celeritas::OpaqueId<celeritas::TrackSlot_, unsigned int> >, unsigned long> (policy=..., f=..., count=1024) at /usr/local/cuda/include/thrust/system/cuda/detail/parallel_for.h:63
#16 thrust::THRUST_200500_800_NS::cuda_cub::fill_n<thrust::THRUST_200500_800_NS::cuda_cub::par_t, thrust::THRUST_200500_800_NS::device_ptr<celeritas::OpaqueId<celeritas::TrackSlot_, unsigned int> >, unsigned long, celeritas::OpaqueId<celeritas::TrackSlot_, unsigned int> > (policy=..., value=..., count=1024, first=...) at /usr/local/cuda/include/thrust/system/cuda/detail/fill.h:77
#17 thrust::THRUST_200500_800_NS::fill_n<thrust::THRUST_200500_800_NS::cuda_cub::par_t, thrust::THRUST_200500_800_NS::device_ptr<celeritas::OpaqueId<celeritas::TrackSlot_, unsigned int> >, unsigned long, celeritas::OpaqueId<celeritas::TrackSlot_, unsigned int> > (
    value=..., n=1024, first=..., exec=...) at /usr/local/cuda/include/thrust/detail/fill.inl:55
#18 celeritas::Filler<celeritas::OpaqueId<celeritas::TrackSlot_, unsigned int>, (celeritas::MemSpace)1>::fill_device_impl (this=this@entry=0x7fffffff5d88, data=...)
    at /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas/src/corecel/data/Filler.device.t.hh:36
#19 0x00007fffceb287d1 in celeritas::Filler<celeritas::OpaqueId<celeritas::TrackSlot_, unsigned int>, (celeritas::MemSpace)1>::operator() (data=..., this=0x7fffffff5d88)
    at /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas/src/corecel/data/Filler.hh:49
#20 celeritas::fill<celeritas::OpaqueId<celeritas::TrackSlot_, unsigned int>, (celeritas::Ownership)1, (celeritas::MemSpace)1, celeritas::OpaqueId<celeritas::TrackSlot_, unsigned int> > (value=..., col=0x7da5c690)
    at /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas/src/corecel/data/CollectionAlgorithms.hh:33
#21 celeritas::InitializeTracksAction::step_impl<(celeritas::MemSpace)1> (core_state=..., core_params=..., this=0x91b87670) at /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas/src/celeritas/track/InitializeTracksAction.cc:91
#22 celeritas::InitializeTracksAction::step (this=0x91b87670, params=..., state=...) at /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas/src/celeritas/track/InitializeTracksAction.cc:43

instruction causing the crash (multiply and add):

(cuda-gdb)  disas $pc,+16
Dump of assembler code from 0x963ab9c0 to 0x963ab9d0:
=> 0x00000000963ab9c0 <__nv_static_53__016fd2f1_25_InitializeTracksAction_cu_88c0b313_98394__ZN9celeritas6detail64_GLOBAL__N__016fd2f1_25_InitializeTracksAction_cu_88c0b313_9839418launch_action_implINS0_18InitTracksExecutorELb1EEEvNS_5RangeINS_8OpaqueIdINS_7Thread_EjEEEET_+4160>:      IMAD.WIDE.U32 R2, R17, 0x4, R10

I haven't linked the instruction to the line of code yet... Since this block should be executing on the first step and it's successful, maybe there is an actual bug in InitTracksExecutor?

@sethrj
Copy link
Member

sethrj commented Oct 26, 2024

Gosh. Next steps: can we get ROOT (or even hepmc3?) enabled so we can write out the primaries? Can we try testing with ORANGE?

@esseivaju
Copy link
Contributor

ORANGE fails at

/pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas/src/orange/OrangeTrackView.hh:1160:
celeritas: internal assertion failed: lev < this->level()

in the MscApplier kernel.

@sethrj
Copy link
Member

sethrj commented Oct 28, 2024

Well that's not good either...

@sethrj
Copy link
Member

sethrj commented Oct 29, 2024

@esseivaju reports a failure with ORANGE and the Bpip geometry but at least we get one event or so:

EVNTtoHITS 18:56:46 AthenaEventLoopMgr   INFO   ===>>>  start processing event #1, run #999999 0 events processed so far  <<<===
EVNTtoHITS 18:56:50 GPUOffload           INFO GPUOffload: Begin of Event 1 tid = -1
EVNTtoHITS 18:56:50 GPUOffload           INFO GPUOffload: End of Event
EVNTtoHITS 18:56:50 LocalTransporter.cc:256: info: Transporting 13 tracks from event 1 with Celeritas
EVNTtoHITS 18:56:50 AthenaEventLoopMgr   INFO   ===>>>  done processing event #1, run #999999 1 events processed so far  <<<===
EVNTtoHITS 18:56:50 AthenaEventLoopMgr   INFO   ===>>>  start processing event #2, run #999999 1 events processed so far  <<<===
EVNTtoHITS 18:56:50 GPUOffload           INFO GPUOffload: Begin of Event 2 tid = -1
EVNTtoHITS 18:56:50 GPUOffload           INFO GPUOffload: End of Event
EVNTtoHITS 18:56:50 LocalTransporter.cc:256: info: Transporting 16 tracks from event 2 with Celeritas
EVNTtoHITS 18:56:51 /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas/src/orange/OrangeTrackView.hh:1160:
EVNTtoHITS 18:56:51 celeritas: internal assertion failed: lev < this->level()
EVNTtoHITS 18:56:51
EVNTtoHITS 18:56:51 -------- EEEE ------- G4Exception-START -------- EEEE -------
EVNTtoHITS 18:56:51 *** G4Exception : celer0004
EVNTtoHITS 18:56:51       issued by : /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas/src/celeritas/global/ActionSequence.cc:112
EVNTtoHITS 18:56:51 Celeritas CUDA error: unspecified launch failure

@esseivaju can you verify whether vecgeom (debug off) works with the beampipe?

@esseivaju
Copy link
Contributor

I still see CUDA memory-related issues with the Bpipe+VecGeom. I also have a dump of the Bpipe Gdml that I'll send on Slack since I can't push to https://gitlab.cern.ch/bmorgan/atlassim-6635/-/tree/main?ref_type=heads

@sethrj
Copy link
Member

sethrj commented Oct 30, 2024

Does ATLAS have a "pretend everything is a solid block of aluminum" option??? Are there any weird potential "thread local" behavior outside of the usual geant4 that could be causing issues?

@esseivaju
Copy link
Contributor

Does ATLAS have a "pretend everything is a solid block of aluminum" option???

Nothing simpler than the beam pipe we're already using.

Are there any weird potential "thread local" behavior outside of the usual geant4 that could be causing issues?

Atlas has a lot of custom code for interfacing with Geant4. We're running single-threaded here so thread_local shouldn't be causing issues

@sethrj
Copy link
Member

sethrj commented Oct 31, 2024

From @esseivaju regarding @amandalund 's suggestion to run a single track slot, still getting a failure on the very first step:

EVNTtoHITS 00:38:59 GPUOffload           INFO GPUOffload: Begin of Event 1 tid = -1
EVNTtoHITS 00:38:59 LocalTransporter.cc:257: info: Transporting 1 tracks (4.76501 MeV cumulative kinetic energy) from event 1 with Celeritas
EVNTtoHITS 00:39:00
EVNTtoHITS 00:39:00 -------- EEEE ------- G4Exception-START -------- EEEE -------
EVNTtoHITS 00:39:00 *** G4Exception : celer0003
EVNTtoHITS 00:39:00       issued by : celeritas/global/ActionSequence.cc:112
========= Invalid __global__ read of size 8 bytes
=========     at void celeritas::detail::<unnamed>::launch_action_impl<celeritas::detail::InitTracksExecutor, (bool)1>(celeritas::Range<celeritas::OpaqueId<celeritas::Thread_, unsigned int>>, T1)+0x6af0
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x18 is out of bounds
=========     and is 8732540904 bytes before the nearest allocation at 0x208800000 of size 10027264 bytes

This has got to be some sort of linking issue...

@sethrj sethrj added bug Something isn't working external Dependencies and framework-oriented features labels Nov 1, 2024
@sethrj
Copy link
Member

sethrj commented Nov 1, 2024

SASS looks effectively the same between standalone/Celeritas ATLAS:

esseivaj@perlmutter:login26:/pscratch/sd/e/esseivaj/celer-athena> diff -EZbw --suppress-common-lines inittracks_standalone inittracks_athsim
1c1
<                 Function : __nv_static_54__7c5e5389_25_InitializeTracksAction_cu_1d15004e_575166__ZN9celeritas6detail65_GLOBAL__N__7c5e5389_25_InitializeTracksAction_cu_1d15004e_57516618launch_action_implINS0_18InitTracksExecutorELb1EEEvNS_5RangeINS_8OpaqueIdINS_7Thread_EjEEEET_
---
>                 Function : __nv_static_54__7c5e5389_25_InitializeTracksAction_cu_fe2be81f_973355__ZN9celeritas6detail65_GLOBAL__N__7c5e5389_25_InitializeTracksAction_cu_fe2be81f_97335518launch_action_implINS0_18InitTracksExecutorELb1EEEvNS_5RangeINS_8OpaqueIdINS_7Thread_EjEEEET_
29c29
<         /*00d0*/                   ULDC.64 UR4, c[0x0][0x0] ;                        /* 0x0000000000047ab9 */
---
>         /*00d0*/                   ULDC.64 UR4, c[0x2][0x0] ;                                        /* 0x0080000000047ab9 */

taken from celeritas/build-ndebug/lib64/libceleritas_static.a and athsim_build/x86_64-el9-gcc13-opt/lib/libAtlasGeant4.so.


Build with shared libraries seems to fail due to the "split" RDC libraries:

[7064/7070] Generating ../../x86_64-el9-gcc13-opt/lib/libAtlasGeant4.components
FAILED: x86_64-el9-gcc13-opt/lib/libAtlasGeant4.components /pscratch/sd/e/esseivaj/celer-athena/athsim_build/x86_64-el9-gcc13-opt/lib/libAtlasGeant4.components
cd /pscratch/sd/e/esseivaj/celer-athena/athsim_build/Simulation/AtlasGeant4 && /pscratch/sd/e/esseivaj/celer-athena/athsim_build/CMakeFiles/atlas_build_run.sh /pscratch/sd/e/esseivaj/celer-athena/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/bin/listcomponents --output /pscratch/sd/e/esseivaj/celer-athena/athsim_build/x86_64-el9-gcc13-opt/lib/libAtlasGeant4.components libAtlasGeant4.so
/pscratch/sd/e/esseivaj/celer-athena/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/bin/listcomponents: symbol lookup error: /pscratch/sd/e/esseivaj/celer-athena/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/lib/libcorecel.so: undefined symbol: __cudaRegisterLinkedBinary_d0380511_9_Filler_cu_b666e28f_701951

seems like this is a postprocessing tool to check the libraries for missing symbols (like -Wl,-z,defs).

@sethrj
Copy link
Member

sethrj commented Nov 1, 2024

@esseivaju traced the failure to the first access of member data inside the BVH class in the BVH navigator:

	//## File "/pscratch/sd/e/esseivaj/celer-athena/install/AthSimulationExternals/22.0.0/InstallArea/x86_64-el9-gcc13-opt/include/VecGeom/base/BVH.h", line 264 inlined at "/pscratch/sd/e/esseivaj/celer-athena/celeritas/src/geocel/vg/detail/BVHNavigator.hh", line 65
	//## File "/pscratch/sd/e/esseivaj/celer-athena/celeritas/src/geocel/vg/detail/BVHNavigator.hh", line 65 inlined at "/pscratch/sd/e/esseivaj/celer-athena/celeritas/src/geocel/vg/VecgeomTrackView.hh", line 228
	//## File "/pscratch/sd/e/esseivaj/celer-athena/celeritas/src/geocel/vg/VecgeomTrackView.hh", line 228 inlined at "/pscratch/sd/e/esseivaj/celer-athena/celeritas/src/celeritas/track/detail/InitTracksExecutor.hh", line 137
	//## File "/pscratch/sd/e/esseivaj/celer-athena/celeritas/src/celeritas/track/detail/InitTracksExecutor.hh", line 137 inlined at "/pscratch/sd/e/esseivaj/celer-athena/celeritas/src/corecel/sys/detail/KernelLauncherImpl.device.hh", line 36
	//## File "/pscratch/sd/e/esseivaj/celer-athena/celeritas/src/corecel/sys/detail/KernelLauncherImpl.device.hh", line 36 inlined at "/pscratch/sd/e/esseivaj/celer-athena/celeritas/src/corecel/sys/detail/KernelLauncherImpl.device.hh", line 48
	//## File "/pscratch/sd/e/esseivaj/celer-athena/celeritas/src/corecel/sys/detail/KernelLauncherImpl.device.hh", line 48
        /*1030*/                   LD.E.64 R10, [R38.64+0x18] ;
        /*1040*/                   IMAD.WIDE.U32 R2, R17, 0x4, R10 ;
        /*1050*/                   LD.E R2, [R2.64] ;
        /*1060*/                   BSSY B8, `(.L_x_11) ;
        /*1070*/                   ISETP.GT.AND P0, PT, R2, -0x1, PT ;
        /*1080*/               @P0 BRA `(.L_x_12) ; 

This suggests it's related to the BVH manager's store,

namespace vecgeom {
inline namespace VECGEOM_IMPL_NAMESPACE {
inline std::vector<BVH *> hBVH;
#ifdef VECGEOM_ENABLE_CUDA
inline __device__ BVH *dBVH;
#endif
class BVHManager {
public:
  VECCORE_ATT_HOST_DEVICE
  static BVH const *GetBVH(int id)
  {
#ifdef VECCORE_CUDA_DEVICE_COMPILATION
    return &cuda::dBVH[id];
#else
    return hBVH[id];
#endif
  }

so there's inline __device__ data being shared across translation units (and library boundaries).

Possibly related documentation:

@sethrj
Copy link
Member

sethrj commented Nov 3, 2024

Just to clarify: the failure occurs in vecgeom::cuda::BVH::LevelLocate at the first access of a data member, fNChild[0], where &fNChild - this == 0x18 (three 8-byte pointers precede it in the class definition). The failing error is Address 0x18 is out of bounds and is occurring at LD.E.64 R10, [R38.64+0x18] which means that the temporary R38.64 is a null pointer. So conclusively cuda::dBVH is null for reasons unknown, only when compiling through ATLAS.

@sethrj
Copy link
Member

sethrj commented Nov 4, 2024

Confirmed by @esseivaju with #1481:

EVNTtoHITS 16:47:14 /pscratch/sd/e/esseivaj/celer-athena/celeritas/src/geocel/vg/VecgeomParams.cc:482: error: VecGeom CUDA may not be correctly linked or initialized (BVH device pointers are null or inconsistent: nullptr from Celeritas device kernel, nullptr from Celeritas runtime symbol, unavailable from VecGeom runtime symbol)

The change to use inline __device__ pointers rather than device functions was performed in 683fa167 and 8bb8eeeb by @bernhardmgruber... perhaps we need to bring in the Nvidia big guns to help us debug why the inline device symbols aren't working across translation units, and why the underlying device pointers are failing only in the ATLAS framework.

@esseivaju
Copy link
Contributor

I've collected the various compiler and linker invocation from build logs here.

@esseivaju
Copy link
Contributor

esseivaju commented Nov 11, 2024

I have a build of AthSimulation working with Celeritas GPU offload. Required changes are:

  • Fix linking Athena with VecGeom+CUDA #1489
  • VecGeom CMake patch from @pcanal to only build static library instead of always building a final vecgeomcuda shared lib
  • Remove Geant4's VecGeom dependency in AtlasExternals from -DGEANT4_USE_USOLIDS:STRING=CONS;POLYCONE. This causes Geant4 to pick up the final libvecgeomcuda.a libraries which conflicts in the final device link step of libAtlasG4.so (temporary hack)

@sethrj
Copy link
Member

sethrj commented Nov 12, 2024

To preserve the history of how Philippe and Julien figured this out:


@pcanal Thursday at 20:15
It could still be an link line order issue. What is the failing link line (the list an order of libraries is what’s important)?


@esseivaju Thursday at 21:32

FAILED: CMakeFiles/geocel_final.dir/cmake_device_link.o
/usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/cvmfs/sft.cern.ch/lcg/releases/gcc/13.1.0-b3d18/x86_64-el9/bin/g++ -O2 -g -DNDEBUG "--generate-code=arch=compute_80,code=[compute_80,sm_80]" /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas-build/lib/libgeocel.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas-build/lib/libcorecel.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libvecgeomcuda_static.a -Xnvlink --suppress-stack-size-warning -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink src/geocel/CMakeFiles/geocel_final.dir/CMakeFiles/geocel_emptyfile.cu.o -o CMakeFiles/geocel_final.dir/cmake_device_link.o  lib/libgeocel.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4persistency.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4run.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4event.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4tracking.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4processes.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4digits_hits.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4analysis.a  /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4track.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4geometry.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4graphics_reps.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libvecgeomcuda.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4materials.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4zlib.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4particles.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4intercoms.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4global.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4ptl.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libvgdml.a lib/libcorecel.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libvecgeomcuda_static.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libvecgeom.a -lrt -lpthread -ldl -lcudadevrt -lcudart
nvlink error   : Undefined reference to '_ZN7vecgeom4cuda4dBVHE' in '/pscratch/sd/e/esseivaj/celer-athena/externals_build/src/Celeritas-build/lib/libgeocel.a:RaytraceImager.cu.o'

@pcanal Thursday at 21:32
Try:

nm -A CMakeFiles/geocel_final.dir/cmake_device_link.o  lib/libgeocel.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4persistency.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4run.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4event.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4tracking.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4processes.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4digits_hits.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4analysis.a  /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4track.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4geometry.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4graphics_reps.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libvecgeomcuda.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4materials.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4zlib.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4particles.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4intercoms.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4global.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libG4ptl.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libvgdml.a lib/libcorecel.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libvecgeomcuda_static.a /pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libvecgeom.a | grep _ZN7vecgeom4cuda4dBVHE

@esseivaju Thursday at 21:36

nm: 'CMakeFiles/geocel_final.dir/cmake_device_link.o': No such file
lib/libgeocel.a:VecgeomSetup.cu.o:                 U _ZN7vecgeom4cuda4dBVHE
/pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libvecgeomcuda.a:BVHManager.cu.o:0000000000000010 B _ZN7vecgeom4cuda4dBVHE
/pscratch/sd/e/esseivaj/celer-athena/externals_build/x86_64-el9-gcc13-opt/lib/libvecgeomcuda_static.a:BVHManager.cu.o:0000000000000010 B _ZN7vecgeom4cuda4dBVHE

@pcanal Thursday at 21:40
this is indeed odd ;(


@esseivaju Thursday at 21:48

For reference, here’s the diff causing that problem:

diff --git a/VecGeom/management/BVHManager.h b/VecGeom/management/BVHManager.h
index c760ec2d7..d3a061dd1 100644
--- a/VecGeom/management/BVHManager.h
+++ b/VecGeom/management/BVHManager.h
@@ -17,7 +17,7 @@ namespace vecgeom {
 inline namespace VECGEOM_IMPL_NAMESPACE {
 inline std::vector<BVH *> hBVH;
 #ifdef VECGEOM_ENABLE_CUDA
-inline __device__ BVH *dBVH;
+extern __device__ BVH *dBVH;
 #endif

 // Macro allowing downstream codes to use GetDeviceBVH
diff --git a/source/BVHManager.cu b/source/BVHManager.cu
index 8c03d103b..2a740594a 100644
--- a/source/BVHManager.cu
+++ b/source/BVHManager.cu
@@ -10,6 +10,9 @@ using vecgeom::cxx::CudaCheckError;

 namespace vecgeom {
 inline namespace cuda {
+
+__device__ BVH *dBVH;
+
 void *AllocateDeviceBVHBuffer(size_t n)
 {
   BVH *ptr = nullptr;

@esseivaju Friday at 18:29

libvecgeomcuda.a is a final library (i.e. includes a dlink step) and libvecgeomcuda_static.a doesn’t include a dlink step. Isn’t that a problem if libvecgeomcuda.a is linked to libgeocel_final.a which also includes a dlink step? (edited)


@esseivaju Friday at 19:21

Similarly, libvecgeomcuda.a is included when dlinking the final atlas shared lib, my understanding is that we should only link to libvecgeomcuda_static.a


@pcanal Friday at 20:01

Yes, I think you are right. This is what I am trying to address with the first commit of https://github.com/celeritas-project/celeritas/pull/1489/commits but despites all the other (then necessary) commits, I will have some issues when shared library are also part of the mix.

A priori that first commit could solve the problem (in conjunction with the other commit, I think the fully static build works). (but the fully shared is broken 😞 ).


@esseivaju Friday at 20:04

I am using that PR when building Celeritas and AthSimulation. I still see libvecgeomcuda.a (and libvecgeomcuda_static.a) being picked up by the final atlas lib. Everything links successfully but we still have the kernel crash (edited)


@pcanal Friday at 20:38
I assume that Atlas as its own handling of the Rdc (I.e. I can’t be using the modified cmake fragment in Celeritas) and so their own fragment may need tweaking to avoid link against both libvecgeomcuda.a and libvecgeomcuda_static.a


@esseivaju Friday at 21:01

It might come from Geant4 since they depend on VecGeom and would need the final library :thinking_face:


@esseivaju Saturday at 01:29

That was it! Removing -DGEANT4_USE_USOLIDS:STRING=CONS;POLYCONE when compiling Geant4 removes the VecGeom dependency and no longer pulls in libvecgeomcuda.a so it’s able to run a step on GPU now with #1489. It fails shortly after in the sensitive detectors but at least the GPU works. Without SD, the 1000 events finish without crash

05:20:19 LocalTransporter.cc:257: info: Transporting 611 tracks (21870.4 MeV cumulative kinetic energy) from event 1 with Celeritas
05:20:19 HitProcessor.cc:188: debug: Processing 12 hits
05:20:20  0x7f2af824c62c TileGeoG4SDCalc::BirkLaw(G4Step const*) const /pscratch/sd/e/esseivaj/celer-athena/athena/TileCalorimeter/TileG4/TileGeoG4SD/src/TileGeoG4SDCalc.cc:882:89   + 0x5c [/pscratch/sd/e/esseivaj/celer-athena/athsim_build/x86_64-el9-gcc13-opt/lib/libAtlasGeant4.so D[0x131c62c]]
05:20:21  0x7f2af824df15 TileGeoG4SDCalc::MakePmtEdepTime(G4Step const*, TileHitData&, double&) const /pscratch/sd/e/esseivaj/celer-athena/athena/TileCalorimeter/TileG4/TileGeoG4SD/src/TileGeoG4SDCalc.cc:441:59   + 0x3b5 [/pscratch/sd/e/esseivaj/celer-athena/athsim_build/x86_64-el9-gcc13-opt/lib/libAtlasGeant4.so D[0x131df15]]
05:20:21  0x7f2af82477a2 TileGeoG4SD::ProcessHits(G4Step*, G4TouchableHistory*) /pscratch/sd/e/esseivaj/celer-athena/athena/TileCalorimeter/TileG4/TileGeoG4SD/src/TileGeoG4SD.cc:68:3   + 0x82 [/pscratch/sd/e/esseivaj/celer-athena/athsim_build/x86_64-el9-gcc13-opt/lib/libAtlasGeant4.so D[0x13177a2]]
05:20:21  0x7f2af911a821 celeritas::detail::HitProcessor::operator()(celeritas::DetectorStepOutput const&) const /pscratch/sd/e/esseivaj/celer-athena/celeritas/src/accel/detail/HitProcessor.cc:193:35   + 0x7f2af911a821 [/pscratch/sd/e/esseivaj/celer-athena/athsim_build/x86_64-el9-gcc13-opt/lib/libAtlasGeant4.so D[0x21ea821]]
(edited)

@sethrj
Copy link
Member

sethrj commented Nov 14, 2024

Next steps for @drbenmorgan are to rebuild atlas externals:

  • Use the pending VecGeom update
  • Disable vecgeom/usolids in Geant4 (and we should try understand if/when they are used)
  • Update Celeritas to the latest develop

@drbenmorgan
Copy link
Contributor

Thanks @sethrj, I'll get these done by early next week, pending the full VecGeom fix, but I can get other bits in place.

@esseivaju
Copy link
Contributor

Can we close this now that #1489 and the VecGeom patch are merged?

@esseivaju esseivaju linked a pull request Nov 20, 2024 that will close this issue
@sethrj sethrj closed this as completed Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working external Dependencies and framework-oriented features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants