Add atomics for floating point types. #286

sleeepyjack · 2022-06-22T19:37:55Z

This PR is a draft to add support for float/double atomics.

Please review and let me know what is missing.
Unfortunately, the diff between the old and new codegen output is a mess due to the reordering of operations.

Also rolls back #282 and fixes #279

include/cuda/std/detail/libcxx/include/support/atomic/atomic_cuda_generated.h

codegen/codegen.cpp

wmaxey · 2022-06-22T22:05:43Z

I'm overall happy with these changes. I'll start up CI, and since Windows is not having issues I don't expect issues elsewhere either.

wmaxey · 2022-06-23T20:05:50Z

include/cuda/std/detail/libcxx/include/atomic

@@ -1239,7 +1239,7 @@ _LIBCUDACXX_INLINE_VISIBILITY void __cxx_atomic_wait(__cxx_atomic_impl<_Tp, _Sco
 }

 // general atomic<T>/atomic_ref<T>
-template <class _Tp, int _Sco = 0, bool = is_integral<_Tp>::value && !is_same<_Tp, bool>::value>
+template <class _Tp, int _Sco = 0, bool = (is_integral<_Tp>::value || is_floating_point<_Tp>::value) && !is_same<_Tp, bool>::value>


Because we are enabling the arithmetic operators in cuda::std::atomics as well we would need to extend the tests to exercise floating point.

These tests live in tests/std/atomics/ rather than tests/cuda/atomics

.upstream-tests/test/std/atomics/atomics.types.operations/atomics.types.operations.arith is this the right directory to extend the fp arithmetic tests?

Ah, I figure I need to add some more tests under the std/atomics directory. Basically I need to mirror all tests over integral types.

Yeah, the integral tests would be a good start. Replicating the tests and doing the dispatch yourself for float/double would probably be sufficient. I don't think figuring out how to fit bitwise/arithmetic dispatches into the atomics_helpers there would be worth the effort.

sleeepyjack · 2022-06-24T19:29:23Z

Whups, while writing some more tests, I just stumbled over the fp min/max problem, i.e. not having specific instructions for them. I'm gonna adjust codegen.cpp so it emits CAS loop specializations.

wmaxey · 2022-06-24T23:10:34Z

Whups, while writing some more tests, I just stumbled over the fp min/max problem, i.e. not having specific instructions for them. I'm gonna adjust codegen.cpp so it emits CAS loop specializations.

Hmm, yeah, I'll take a second pass over this component when I make fixes for the current issues as well. There's some overlap here with another problem. #279

sleeepyjack · 2022-06-28T16:00:23Z

After extending the tests, I get some weird errors that I am yet unable to track down:

********************
FAIL: libcu++ :: std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_sub_explicit.pass.cpp (137 of 1215)
******************** TEST 'libcu++ :: std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_sub_explicit.pass.cpp' FAILED ********************
Command: ['/usr/local/cuda/bin/nvcc', '-o', '/workspaces/libcudacxx/build/test/std/atomics/atomics.types.operations/atomics.types.operations.req/Output/atomic_fetch_sub_explicit.pass.cpp.o', '-x', 'cu', '/workspaces/libcudacxx/.upstream-tests/test/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_sub_explicit.pass.cpp', '-c', '-v', '-ftemplate-depth=270', '-std=c++17', '-include', '/workspaces/libcudacxx/test/support/nasty_macros.h', '-I/workspaces/libcudacxx/include', '-D__STDC_FORMAT_MACROS', '-D__STDC_LIMIT_MACROS', '-D__STDC_CONSTANT_MACROS', '-Xcompiler', '-fno-exceptions', '-Xcompiler', '-fno-rtti', '-D_LIBCUDACXX_NO_RTTI', '-I/workspaces/libcudacxx/test/support', '-include', '/workspaces/libcudacxx/test/force_include.h', '-I/workspaces/libcudacxx/include', '--extended-lambda', '-gencode=arch=compute_61,code=sm_61', '-Xcudafe', '--display_error_number', '-Werror', 'all-warnings', '-Xcompiler', '-Wall', '-Xcompiler', '-Wextra', '-Xcompiler', '-Werror', '-Xcompiler', '-Wno-literal-suffix', '-Xcompiler', '-Wno-unused-parameter', '-Xcompiler', '-Wno-deprecated-declarations', '-Xcompiler', '-Wno-noexcept-type', '-Xcompiler', '-Wno-unused-function', '-D_LIBCUDACXX_DISABLE_PRAGMA_GCC_SYSTEM_HEADER', '-c']
Exit Code: 1
Standard Error:
--
#$ _NVVM_BRANCH_=nvvm
#$ _SPACE_= 
#$ _CUDART_=cudart
#$ _HERE_=/usr/local/cuda/bin
#$ _THERE_=/usr/local/cuda/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_DIR_=targets/x86_64-linux
#$ TOP=/usr/local/cuda/bin/..
#$ NVVMIR_LIBRARY_DIR=/usr/local/cuda/bin/../nvvm/libdevice
#$ LD_LIBRARY_PATH=/usr/local/cuda/bin/../lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
#$ PATH=/usr/local/cuda/bin/../nvvm/bin:/usr/local/cuda/bin:/vscode/vscode-server/bin/linux-x64/30d9c6cd9483b2cc586687151bcbcd635f373630/bin/remote-cli:/conda/bin:/conda/condabin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
#$ INCLUDES="-I/usr/local/cuda/bin/../targets/x86_64-linux/include"  
#$ LIBRARIES=  "-L/usr/local/cuda/bin/../targets/x86_64-linux/lib/stubs" "-L/usr/local/cuda/bin/../targets/x86_64-linux/lib"
#$ CUDAFE_FLAGS=
#$ PTXAS_FLAGS=
#$ gcc -std=c++17 -D__CUDA_ARCH__=610 -D__CUDA_ARCH_LIST__=610 -E -x c++  -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__ -D__CUDACC_EXTENDED_LAMBDA__  -fno-exceptions -fno-rtti -Wall -Wextra -Werror -Wno-literal-suffix -Wno-unused-parameter -Wno-deprecated-declarations -Wno-noexcept-type -Wno-unused-function -I"/workspaces/libcudacxx/include" -I"/workspaces/libcudacxx/test/support" -I"/workspaces/libcudacxx/include" "-I/usr/local/cuda/bin/../targets/x86_64-linux/include"    -D "__STDC_FORMAT_MACROS" -D "__STDC_LIMIT_MACROS" -D "__STDC_CONSTANT_MACROS" -D "_LIBCUDACXX_NO_RTTI" -D "_LIBCUDACXX_DISABLE_PRAGMA_GCC_SYSTEM_HEADER" -D__CUDACC_VER_MAJOR__=11 -D__CUDACC_VER_MINOR__=7 -D__CUDACC_VER_BUILD__=64 -D__CUDA_API_VER_MAJOR__=11 -D__CUDA_API_VER_MINOR__=7 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -include "/workspaces/libcudacxx/test/support/nasty_macros.h" -include "/workspaces/libcudacxx/test/force_include.h" -Werror -m64 "/workspaces/libcudacxx/.upstream-tests/test/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_sub_explicit.pass.cpp" -o "/tmp/tmpxft_000013bf_00000000-7_atomic_fetch_sub_explicit.pass.cpp1.ii" 
#$ cicc --c++17 --gnu_version=110200 --promote_warnings --display_error_number --orig_src_file_name "/workspaces/libcudacxx/.upstream-tests/test/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_sub_explicit.pass.cpp" --orig_src_path_name "/workspaces/libcudacxx/.upstream-tests/test/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_sub_explicit.pass.cpp" --allow_managed --pending_instantiations=270 --extended-lambda  --display_error_number  -arch compute_61 -m64 --no-version-ident -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name "tmpxft_000013bf_00000000-3_atomic_fetch_sub_explicit.pass.fatbin.c" -tused --gen_module_id_file --module_id_file_name "/tmp/tmpxft_000013bf_00000000-4_atomic_fetch_sub_explicit.pass.module_id" --gen_c_file_name "/tmp/tmpxft_000013bf_00000000-6_atomic_fetch_sub_explicit.pass.cudafe1.c" --stub_file_name "/tmp/tmpxft_000013bf_00000000-6_atomic_fetch_sub_explicit.pass.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_000013bf_00000000-6_atomic_fetch_sub_explicit.pass.cudafe1.gpu"  "/tmp/tmpxft_000013bf_00000000-7_atomic_fetch_sub_explicit.pass.cpp1.ii" -o "/tmp/tmpxft_000013bf_00000000-6_atomic_fetch_sub_explicit.pass.ptx"
#$ ptxas --warning-as-error -arch=sm_61 -m64  "/tmp/tmpxft_000013bf_00000000-6_atomic_fetch_sub_explicit.pass.ptx"  -o "/tmp/tmpxft_000013bf_00000000-8_atomic_fetch_sub_explicit.pass.cubin" 
#$ fatbinary -64 --cicc-cmdline="-ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 " "--image3=kind=elf,sm=61,file=/tmp/tmpxft_000013bf_00000000-8_atomic_fetch_sub_explicit.pass.cubin" --embedded-fatbin="/tmp/tmpxft_000013bf_00000000-3_atomic_fetch_sub_explicit.pass.fatbin.c" 
#$ rm /tmp/tmpxft_000013bf_00000000-3_atomic_fetch_sub_explicit.pass.fatbin
#$ gcc -std=c++17 -D__CUDA_ARCH_LIST__=610 -E -x c++ -D__CUDACC__ -D__NVCC__ -D__CUDACC_EXTENDED_LAMBDA__  -fno-exceptions -fno-rtti -Wall -Wextra -Werror -Wno-literal-suffix -Wno-unused-parameter -Wno-deprecated-declarations -Wno-noexcept-type -Wno-unused-function -I"/workspaces/libcudacxx/include" -I"/workspaces/libcudacxx/test/support" -I"/workspaces/libcudacxx/include" "-I/usr/local/cuda/bin/../targets/x86_64-linux/include"    -D "__STDC_FORMAT_MACROS" -D "__STDC_LIMIT_MACROS" -D "__STDC_CONSTANT_MACROS" -D "_LIBCUDACXX_NO_RTTI" -D "_LIBCUDACXX_DISABLE_PRAGMA_GCC_SYSTEM_HEADER" -D__CUDACC_VER_MAJOR__=11 -D__CUDACC_VER_MINOR__=7 -D__CUDACC_VER_BUILD__=64 -D__CUDA_API_VER_MAJOR__=11 -D__CUDA_API_VER_MINOR__=7 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -include "/workspaces/libcudacxx/test/support/nasty_macros.h" -include "/workspaces/libcudacxx/test/force_include.h" -Werror -m64 "/workspaces/libcudacxx/.upstream-tests/test/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_sub_explicit.pass.cpp" -o "/tmp/tmpxft_000013bf_00000000-5_atomic_fetch_sub_explicit.pass.cpp4.ii" 
#$ cudafe++ --c++17 --gnu_version=110200 --promote_warnings --display_error_number --orig_src_file_name "/workspaces/libcudacxx/.upstream-tests/test/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_sub_explicit.pass.cpp" --orig_src_path_name "/workspaces/libcudacxx/.upstream-tests/test/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_sub_explicit.pass.cpp" --allow_managed --pending_instantiations=270 --extended-lambda  --display_error_number --m64 --parse_templates --gen_c_file_name "/tmp/tmpxft_000013bf_00000000-6_atomic_fetch_sub_explicit.pass.cudafe1.cpp" --stub_file_name "tmpxft_000013bf_00000000-6_atomic_fetch_sub_explicit.pass.cudafe1.stub.c" --module_id_file_name "/tmp/tmpxft_000013bf_00000000-4_atomic_fetch_sub_explicit.pass.module_id" "/tmp/tmpxft_000013bf_00000000-5_atomic_fetch_sub_explicit.pass.cpp4.ii" 
#$ gcc -std=c++17 -D__CUDA_ARCH__=610 -D__CUDA_ARCH_LIST__=610 -c -x c++  -DCUDA_DOUBLE_MATH_FUNCTIONS -fno-exceptions -fno-rtti -Wall -Wextra -Werror -Wno-literal-suffix -Wno-unused-parameter -Wno-deprecated-declarations -Wno-noexcept-type -Wno-unused-function -I"/workspaces/libcudacxx/include" -I"/workspaces/libcudacxx/test/support" -I"/workspaces/libcudacxx/include" "-I/usr/local/cuda/bin/../targets/x86_64-linux/include"   -Werror  -ftemplate-depth-270 -m64 "/tmp/tmpxft_000013bf_00000000-6_atomic_fetch_sub_explicit.pass.cudafe1.cpp" -o "/workspaces/libcudacxx/build/test/std/atomics/atomics.types.operations/atomics.types.operations.req/Output/atomic_fetch_sub_explicit.pass.cpp.o" 
/workspaces/libcudacxx/include/cuda/std/detail/libcxx/include/support/atomic/atomic_base.h: In instantiation of 'cuda::std::__4::__detail::__host::__cxx_atomic_underlying_t<_Tp> cuda::std::__4::__detail::__host::__cxx_atomic_fetch_sub(_Tp*, _Td, cuda::std::__4::memory_order) [with _Tp = volatile cuda::std::__4::__detail::__host::__cxx_atomic_base_impl<float, 0>; _Td = float; cuda::std::__4::__detail::__host::__cxx_atomic_underlying_t<_Tp> = float; cuda::std::__4::memory_order = cuda::std::__4::memory_order]':
/workspaces/libcudacxx/include/cuda/std/detail/libcxx/include/support/atomic/atomic_cuda.h:284:38:   required from '_Tp cuda::std::__4::__detail::__cxx_atomic_fetch_sub(volatile cuda::std::__4::__detail::__cxx_atomic_base_heterogeneous_impl<_Tp, _Sco, _Ref>*, _Tp, cuda::std::__4::memory_order) [with _Tp = float; int _Sco = 0; bool _Ref = false; cuda::std::__4::memory_order = cuda::std::__4::memory_order]'
/workspaces/libcudacxx/include/cuda/std/detail/libcxx/include/atomic:1516:32:   required from '_Tp cuda::std::__4::__atomic_base<_Tp, _Sco, true>::fetch_sub(_Tp, cuda::std::__4::memory_order) [with _Tp = float; int _Sco = 0; cuda::std::__4::memory_order = cuda::std::__4::memory_order]'
/workspaces/libcudacxx/include/cuda/std/detail/libcxx/include/atomic:2288:22:   required from 'typename cuda::std::__4::enable_if<((cuda::std::__4::is_integral<_Tp>::value && (! cuda::std::__4::is_same<_Tp, bool>::value)) || cuda::std::__4::is_floating_point<_Tp>::value), _Tp>::type cuda::std::__4::atomic_fetch_sub_explicit(cuda::std::__4::atomic<_Tp>*, _Tp, cuda::std::__4::memory_order) [with _Tp = float; typename cuda::std::__4::enable_if<((cuda::std::__4::is_integral<_Tp>::value && (! cuda::std::__4::is_same<_Tp, bool>::value)) || cuda::std::__4::is_floating_point<_Tp>::value), _Tp>::type = float; cuda::std::__4::memory_order = cuda::std::__4::memory_order]'
/workspaces/libcudacxx/.upstream-tests/test/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_sub_explicit.pass.cpp:49:38:   required from 'void TestFn<T, Selector, <anonymous> >::operator()() const [with T = float; Selector = local_memory_selector; cuda::std::__4::__detail::thread_scope <anonymous> = cuda::std::__4::__detail::thread_scope_system]'
/workspaces/libcudacxx/.upstream-tests/test/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_helpers.h:74:40:   required from 'void TestEachFloatingPointType<TestFunctor, Selector, Scope>::operator()() const [with TestFunctor = TestFn; Selector = local_memory_selector; cuda::std::__4::__detail::thread_scope Scope = cuda::std::__4::__detail::thread_scope_system]'
/workspaces/libcudacxx/.upstream-tests/test/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_sub_explicit.pass.cpp:95:61:   required from here
/workspaces/libcudacxx/include/cuda/std/detail/libcxx/include/support/atomic/atomic_base.h:136:26: error: operand type 'volatile float*' is incompatible with argument 1 of '__atomic_fetch_sub'
  136 |   return __atomic_fetch_sub(__a_tmp, __delta * __skip_v,
      |        ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                  
/workspaces/libcudacxx/include/cuda/std/detail/libcxx/include/support/atomic/atomic_base.h: In instantiation of 'cuda::std::__4::__detail::__host::__cxx_atomic_underlying_t<_Tp> cuda::std::__4::__detail::__host::__cxx_atomic_fetch_sub(_Tp*, _Td, cuda::std::__4::memory_order) [with _Tp = volatile cuda::std::__4::__detail::__host::__cxx_atomic_base_impl<double, 0>; _Td = double; cuda::std::__4::__detail::__host::__cxx_atomic_underlying_t<_Tp> = double; cuda::std::__4::memory_order = cuda::std::__4::memory_order]':
/workspaces/libcudacxx/include/cuda/std/detail/libcxx/include/support/atomic/atomic_cuda.h:284:38:   required from '_Tp cuda::std::__4::__detail::__cxx_atomic_fetch_sub(volatile cuda::std::__4::__detail::__cxx_atomic_base_heterogeneous_impl<_Tp, _Sco, _Ref>*, _Tp, cuda::std::__4::memory_order) [with _Tp = double; int _Sco = 0; bool _Ref = false; cuda::std::__4::memory_order = cuda::std::__4::memory_order]'
/workspaces/libcudacxx/include/cuda/std/detail/libcxx/include/atomic:1516:32:   required from '_Tp cuda::std::__4::__atomic_base<_Tp, _Sco, true>::fetch_sub(_Tp, cuda::std::__4::memory_order) [with _Tp = double; int _Sco = 0; cuda::std::__4::memory_order = cuda::std::__4::memory_order]'
/workspaces/libcudacxx/include/cuda/std/detail/libcxx/include/atomic:2288:22:   required from 'typename cuda::std::__4::enable_if<((cuda::std::__4::is_integral<_Tp>::value && (! cuda::std::__4::is_same<_Tp, bool>::value)) || cuda::std::__4::is_floating_point<_Tp>::value), _Tp>::type cuda::std::__4::atomic_fetch_sub_explicit(cuda::std::__4::atomic<_Tp>*, _Tp, cuda::std::__4::memory_order) [with _Tp = double; typename cuda::std::__4::enable_if<((cuda::std::__4::is_integral<_Tp>::value && (! cuda::std::__4::is_same<_Tp, bool>::value)) || cuda::std::__4::is_floating_point<_Tp>::value), _Tp>::type = double; cuda::std::__4::memory_order = cuda::std::__4::memory_order]'
/workspaces/libcudacxx/.upstream-tests/test/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_sub_explicit.pass.cpp:49:38:   required from 'void TestFn<T, Selector, <anonymous> >::operator()() const [with T = double; Selector = local_memory_selector; cuda::std::__4::__detail::thread_scope <anonymous> = cuda::std::__4::__detail::thread_scope_system]'
/workspaces/libcudacxx/.upstream-tests/test/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_helpers.h:75:41:   required from 'void TestEachFloatingPointType<TestFunctor, Selector, Scope>::operator()() const [with TestFunctor = TestFn; Selector = local_memory_selector; cuda::std::__4::__detail::thread_scope Scope = cuda::std::__4::__detail::thread_scope_system]'
/workspaces/libcudacxx/.upstream-tests/test/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_fetch_sub_explicit.pass.cpp:95:61:   required from here
/workspaces/libcudacxx/include/cuda/std/detail/libcxx/include/support/atomic/atomic_base.h:136:26: error: operand type 'volatile double*' is incompatible with argument 1 of '__atomic_fetch_sub'
# --error 0x1 --
--

Compilation failed unexpectedly!
********************

@wmaxey do you know what this could be?

wmaxey · 2022-06-29T21:00:43Z

After extending the tests, I get some weird errors that I am yet unable to track down:

...

@wmaxey do you know what this could be?

I am not encountering this on your latest, let me try with some other configs. Maybe some changes I had made on top of your branch fixed this.

sleeepyjack · 2022-06-30T11:18:52Z

I am not encountering this on your latest, let me try with some other configs. Maybe some changes I had made on top of your branch fixed this.

For reference: I am running CUDA CTK 11.7.0 with gcc 11.2 in an ubuntu20.04 container. I only have a Pascal card available (sm_61).

wmaxey · 2022-07-06T23:39:09Z

We'll need to break out float add/sub into CAS loops. On MSVC we brutally cast to long*/int* which causes invalid results.

I can make these changes on top of your changes.

…ting point math

wmaxey · 2022-07-07T00:42:38Z

I've added a patch that fixes a few issues, let me know if this resolves your problems as well.

sleeepyjack · 2022-07-07T17:34:59Z

We'll need to break out float add/sub into CAS loops.

Ah, I didn't think of the host side. Good catch!

I've added a patch that fixes a few issues,

Thanks a lot! This fixes most of the previously failing tests.

********************
********************
Failed Tests (4):
  libcu++ :: cuda/bad_atomic_alignment.pass.cpp
  libcu++ :: std/atomics/atomics.types.generic/atomic_copyable.pass.cpp
  libcu++ :: std/atomics/atomics.types.generic/integral_ref.pass.cpp
  libcu++ :: std/atomics/atomics.types.generic/integral_ref_constness.pass.cpp


Testing Time: 419.02s
  Unsupported      :  101
  Passed           : 1091
  Expectedly Failed:   19
  Failed           :    4

real    7m1.338s
user    46m8.690s
sys     8m27.502s
################################################################################
Score: 99.63%

The last four tests that still fail throw the exact same error: cudaErrorInvalidAddressSpace: operation not supported on global/shared address space.

wmaxey · 2022-07-07T21:40:08Z

The last four tests that still fail throw the exact same error: cudaErrorInvalidAddressSpace: operation not supported on global/shared address space.

That's a known issue for Pascal. I do not know the cause, but believe it may have something to do with an unsupported size operand.

I think the only remaining thing would be tests similar to atomics.types.generic/integral.pass.cpp. We don't have CAS/ld/st coverage for floating point types. Which I'm positive works, but we should be complete. :)

sleeepyjack · 2022-07-08T12:10:24Z

I think the only remaining thing would be tests similar to atomics.types.generic/integral.pass.cpp.

/done

Had to break them out into separate files although this introduces a lot of duplicate code.
We could do something like if constexpr (std::is_integral<T>::value) to mask out those operations in the integral_* tests that aren't available for fp types but this would require C++17.

wmaxey · 2022-07-08T17:53:39Z

Thanks for all the effort Daniel! I'll see if there's any issues with the changes again and if not I think it's okay to merge. Though I might remove some comments that documented the 'unsigned' nature of min/max.

jrhemstad · 2022-07-11T14:54:17Z

Would be nice to get @griwes to review this as well.

wmaxey · 2022-07-11T16:02:21Z

https://builds4u.nvidia.com/dvs/#/change/3154615863024800.1?eventType=Virtual&dvs_showStaging=on
https://scbuilds4u/dvs/#/change/3154615539432407.1?eventType=Virtual

DVS is clean, but SC-DVS found ICEs that I am able to repro on VC129, will try to figure out what's going on there.

wmaxey · 2022-07-11T16:02:55Z

C:\sbf\libcudacxx\.upstream-tests\test\std\atomics\atomics.types.operations\atomics.types.operations.wait\../atomics.types.operations.req/atomic_helpers.h(87): note: while compiling class template member function 'void TestEachAtomicType<TestFn,shared_memory_selector,cuda::std::__4::__detail::thread_scope_system>::operator ()(void) const'
C:\sbf\libcudacxx\.upstream-tests\test\std\atomics\atomics.types.operations\atomics.types.operations.wait\atomic_wait.pass.cpp(91): note: see reference to function template instantiation 'void TestEachAtomicType<TestFn,shared_memory_selector,cuda::std::__4::__detail::thread_scope_system>::operator ()(void) const' being compiled
C:\sbf\libcudacxx\.upstream-tests\test\std\atomics\atomics.types.operations\atomics.types.operations.wait\atomic_wait.pass.cpp(91): note: see reference to class template instantiation 'TestEachAtomicType<TestFn,shared_memory_selector,cuda::std::__4::__detail::thread_scope_system>' being compiled
INTERNAL COMPILER ERROR in 'C:\msbuild\2019\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64\cl.exe'
    Please choose the Technical Support command on the Visual C++
    Help menu, or open the Technical Support help file for more information
nvcc error   : 'cl' died with status 0xC0000005 (ACCESS_VIOLATION)
# --error 0xc0000005 --

griwes

In general this looks good to me; most comments I have are in tests, so I'm going to give this a 👍 and leave it up to @wmaxey as to whether these should gate landing this PR or not. The one comment I have in the actual change is rather minor.

griwes · 2022-07-11T22:22:12Z

.upstream-tests/test/cuda/atomics/atomic.ext/atomic_fetch_max.pass.cpp

@@ -60,14 +60,58 @@ struct TestFn {
  }
 };

+template <template<typename, typename> typename Selector, cuda::thread_scope ThreadScope>
+struct TestFn<int, Selector, ThreadScope> {


This specializes specifically for int - shouldn't it, instead, be specialized for any signed integral type?

Specifically, I'm attempting to get a guarantee that signed math is working as expected. It would be completely fair to split it into unsigned and signed specializations. Perhaps more tests for this API are needed. ;)

griwes · 2022-07-11T22:22:31Z

.upstream-tests/test/cuda/atomics/atomic.ext/atomic_fetch_min.pass.cpp

@@ -60,14 +60,59 @@ struct TestFn {
  }
 };

+template <template<typename, typename> typename Selector, cuda::thread_scope ThreadScope>
+struct TestFn<int, Selector, ThreadScope> {


Same as the comment above.

griwes · 2022-07-11T22:23:28Z

.upstream-tests/test/cuda/atomics/atomic.ext/atomic_helpers.h

+    __host__ __device__
+    void operator()() const {
+        TestFunctor<float, Selector, Scope>()();
+        TestFunctor<double, Selector, Scope>()();


Should we also have a host-only call to TestFunctor<long double, Selector, Scope>()() here?

.upstream-tests/test/std/atomics/atomics.types.generic/floating_point.pass.cpp

griwes · 2022-07-11T22:27:12Z

.upstream-tests/test/std/atomics/atomics.types.generic/floating_point.pass.cpp

+
+int main(int, char**)
+{
+    // this test would instantiate more cases than just the ones below


The integral tests here instantiate the test functions for all integer types. Here we only instantiate for two floating point types. It should be fine to remove this comment and have all combinations of scopes and memory selectors actually tested below.

griwes · 2022-07-11T22:27:55Z

.upstream-tests/test/std/atomics/atomics.types.generic/floating_point_ref.pass.cpp

+
+int main(int, char**)
+{
+    // this test would instantiate more cases than just the ones below


Same comment as in the non-ref version of this test.

griwes · 2022-07-11T22:28:23Z

.upstream-tests/test/std/atomics/atomics.types.generic/floating_point_ref_constness.pass.cpp

+
+int main(int, char**)
+{
+    // this test would instantiate more cases than just the ones below


And once again here.

griwes · 2022-07-11T22:29:14Z

...ests/test/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_helpers.h

+    __host__ __device__
+    void operator()() const {
+        TestFunctor<float, Selector, Scope>()();
+        TestFunctor<double, Selector, Scope>()();


Same comment as earlier - on the host here should also be a call for long double.

griwes · 2022-07-11T22:29:49Z

...ests/test/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_helpers.h

+    __host__ __device__
+    void operator()() const {
+        TestFunctor<float, Selector, Scope>()();
+        TestFunctor<double, Selector, Scope>()();


And one more time here.

griwes · 2022-07-11T22:39:06Z

include/cuda/std/detail/libcxx/include/atomic

+typedef atomic<float>  atomic_float;
+typedef atomic<double> atomic_double;


I don't think these two are in the standard. The other ones are there mainly for C interop; do we want these two here? If we do, we should also have a (possibly host-only) atomic_long_double.

I'm going to say we probably don't want these. We'll need to open the door eventually for a half and just diverge further with what's available on H/D.

miscco · 2022-07-12T06:59:58Z

.upstream-tests/test/cuda/atomics/atomic.ext/atomic_fetch_max.pass.cpp

+        Selector<A, constructor_initializer> sel;
+        A & t = *sel.construct();
+        t = int(-1);
+        assert(t.fetch_max(4) == int(-1));


This does not connect with the comment.

We are still testing a smaller int versus a larger threshold. Why is this changed and could we update the comment?

If this does some horrible conversion to unsigned magic we should test that explicitly and keep the basic 3 vs 2 test

There's no horrible conversion, it's just specifically testing int types. The cast is, in truth, unnecessary.

miscco · 2022-07-12T07:04:16Z

.upstream-tests/test/cuda/atomics/atomic.ext/atomic_fetch_min.pass.cpp

+        Selector<volatile A, constructor_initializer> sel;
+        volatile A & t = *sel.construct();
+        t = int(5);
+        assert(t.fetch_min(-1) == int(5));


This is changing the values of the test, when we actually want to only add volatile. I would either keep them the same of change them consistently throughout the file.

We did want to change the test in this case, but there is greater value in making a proper unsigned/signed split test so that we can guarantee that add/sub/max/min are behaving correctly.

.upstream-tests/test/std/atomics/atomics.types.generic/floating_point.pass.cpp

miscco · 2022-07-12T07:09:33Z

.upstream-tests/test/std/atomics/atomics.types.generic/floating_point.pass.cpp

+    assert(obj == T(1));
+    assert(obj.load() == T(1));
+    assert(obj.load(cuda::std::memory_order_acquire) == T(1));
+    assert(obj.exchange(T(2)) == T(1));


I know this is the libcxx style, but could we at least add some newlines between the different functionality that is tested.

Reading a gazillion consecutive lines drains a lot of brain power

…appening on MSVC that seems to cause on internal compiler error

wmaxey · 2022-07-15T03:00:22Z

include/cuda/std/detail/libcxx/include/atomic

-template <typename _Tp, int _Sco,
-          typename _Base = typename conditional<__cxx_is_always_lock_free<_Tp>::__value,
+template <typename _Tp, int _Sco>
+struct __cxx_atomic_impl_conditional {


Something about using is cursed. This is very reminiscent of several tuple fixes.

DVS results will be posted soon, but this fixed builds on all the compilers I was able to get repros on.

wmaxey · 2022-07-16T01:59:02Z

@griwes I've made the tests more straightforward, all signed types now just get extra testing and there isn't some strange int only overload.

@sleeepyjack I'd like to see the atomic_float/double removed. That just opens up questions about long double I guess.

…y that dispatches to the correct partial overload.

…onally include the correct base class

wmaxey · 2022-07-20T04:14:22Z

Latest changes made sure that bitwise types are a superset of arithmetic types. I did some work to refactor the ref/non-ref classes as well, they should be easier to maintain in the future.

miscco · 2022-07-20T06:34:23Z

include/cuda/std/detail/libcxx/include/atomic

+    __atomic_base_storage(_Storage&& __a) _NOEXCEPT : __a_(forward<_Storage>(__a)) {}
+};
+
+template <class _Tp, bool _Cq, typename _Storage>


Could we get a more descriptive name instead of _Cq?

I'll push an update with _Cq->_ConstQualified

miscco · 2022-07-20T06:38:57Z

include/cuda/std/detail/libcxx/include/atomic


    _LIBCUDACXX_INLINE_VISIBILITY _LIBCUDACXX_CONSTEXPR
-    __atomic_base_ref(_Tp& __a) _NOEXCEPT : __a_(__a) {}
+    __atomic_base_core(_Storage&& __a) _NOEXCEPT : __atomic_base_storage<_Tp, _Storage>(forward<_Storage>(__a)) {}


technically we are required to qualify forward as it is a non ugly function

Good catch.

miscco · 2022-07-20T06:41:46Z

include/cuda/std/detail/libcxx/include/atomic

-    __atomic_base(const __atomic_base&) = delete;
-    __atomic_base(__atomic_base&&) = delete;
+    __atomic_base_storage() = default;
+    __atomic_base_storage(const __atomic_base_storage&) = default;


I am having some troubles correctly parsing the difference between the various classes. AFAICT the only difference is whether the special member functions are deleted / defaulted.

Given that the implementation of the classes is considerable, would it make sense to just derive from a single base class that does this for us, like optional does

Are you referring to all the __atomic_base classes? The main difference is const qualifiers. atomic_ref has to allow value updates through const.

sleeepyjack · 2022-07-22T00:45:25Z

Wooohoo, first contribution merged ☑️

Add atomics for floating point types.

f5fbfe9

wmaxey self-assigned this Jun 22, 2022

wmaxey added the bug: functional Does not work as intended. label Jun 22, 2022

wmaxey added this to the 1.9.0 milestone Jun 22, 2022

wmaxey reviewed Jun 22, 2022

View reviewed changes

include/cuda/std/detail/libcxx/include/support/atomic/atomic_cuda_generated.h Outdated Show resolved Hide resolved

codegen/codegen.cpp Outdated Show resolved Hide resolved

wmaxey reviewed Jun 22, 2022

View reviewed changes

codegen/codegen.cpp Outdated Show resolved Hide resolved

Use '&&' instead of 'and' for MSVC compatibility.

339ce25

wmaxey reviewed Jun 23, 2022

View reviewed changes

sleeepyjack added 3 commits June 28, 2022 15:50

Add fetch_min/fetch_max CAS loop specializations for fp types.

ba812a4

Extend tests for fp atomic support.

4868264

Make sure __half atomics do not compile. (not yet supported)

0380363

sleeepyjack mentioned this pull request Jun 30, 2022

[FEA] Migrate from cuda::atomic to cuda::atomic_ref NVIDIA/cuCollections#183

Open

Add fixes and support for atomic_max/min, add CAS loops for host floa…

5a179b1

…ting point math

Fix C++11 support by downgrading use of is_floating_point_v

a38b731

Add tests for fp atomic CAS/load/store.

75a4c3c

wmaxey added 2 commits July 8, 2022 13:06

Add comment headers for fp tests, block less than SM60.

4039a15

Modify documentation to reflect that atomic_[min/max] respect sign.

d16cbbe

jrhemstad requested a review from griwes July 11, 2022 14:54

griwes approved these changes Jul 11, 2022

View reviewed changes

miscco reviewed Jul 12, 2022

View reviewed changes

Replace using with struct, some type strangeness involving float is h…

e0cdb4f

…appening on MSVC that seems to cause on internal compiler error

wmaxey reviewed Jul 15, 2022

View reviewed changes

wmaxey added 3 commits July 15, 2022 12:27

Fix missing typename in __cxx_atomic_impl_conditional thing

742cedd

Make atomic_min/max tests a little more logical

9d30b82

Fix min/max and flag as small types ignore sign.

7a9ddc4

wmaxey requested a review from griwes July 16, 2022 01:48

sleeepyjack and others added 4 commits July 18, 2022 08:36

Remove atomic_float/double typedefs.

cc30e92

atomic_helpers requires 3 arguments for input functors, create a prox…

dde76d0

…y that dispatches to the correct partial overload.

Replace bitwise test with arithmetic for floating point types

cd027ab

Explode bitwise/arithmetic atomics into their own classes and conditi…

4e5c0ae

…onally include the correct base class

wmaxey added the testing: internal ci passed Passed internal NVIDIA CI (DVS). label Jul 20, 2022

miscco reviewed Jul 20, 2022

View reviewed changes

Add cuda/std|std namespace to use of forward in atomic.

61ff7c9

wmaxey added the enhancement New feature or request. label Jul 20, 2022

griwes approved these changes Jul 21, 2022

View reviewed changes

wmaxey merged commit e489e9b into NVIDIA:main Jul 21, 2022

sleeepyjack deleted the feature/fp_atomics branch July 22, 2022 00:45

bdice mentioned this pull request Aug 1, 2022

Fix atomic operations on NaN values rapidsai/cudf#11420

Merged

		typedef atomic<float> atomic_float;
		typedef atomic<double> atomic_double;

Add atomics for floating point types. #286

Add atomics for floating point types. #286

Conversation

sleeepyjack commented Jun 22, 2022 • edited Loading

wmaxey commented Jun 22, 2022

Choose a reason for hiding this comment

sleeepyjack Jun 23, 2022 • edited Loading

Choose a reason for hiding this comment

sleeepyjack Jun 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sleeepyjack commented Jun 24, 2022 • edited Loading

wmaxey commented Jun 24, 2022

sleeepyjack commented Jun 28, 2022

wmaxey commented Jun 29, 2022

sleeepyjack commented Jun 30, 2022

wmaxey commented Jul 6, 2022

wmaxey commented Jul 7, 2022

sleeepyjack commented Jul 7, 2022

wmaxey commented Jul 7, 2022

sleeepyjack commented Jul 8, 2022

wmaxey commented Jul 8, 2022

jrhemstad commented Jul 11, 2022

wmaxey commented Jul 11, 2022

wmaxey commented Jul 11, 2022

griwes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wmaxey commented Jul 16, 2022

wmaxey commented Jul 20, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miscco Jul 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sleeepyjack commented Jul 22, 2022

sleeepyjack commented Jun 22, 2022 •

edited

Loading

sleeepyjack Jun 23, 2022 •

edited

Loading

sleeepyjack Jun 23, 2022 •

edited

Loading

sleeepyjack commented Jun 24, 2022 •

edited

Loading

miscco Jul 20, 2022 •

edited

Loading