-
Notifications
You must be signed in to change notification settings - Fork 187
Conversation
Looks good overall pending the first review comment; please ping me when you un-WIP this. |
#endif | ||
} | ||
template<class _Tp, int _Sco> | ||
__host__ __device__ inline _Tp __cxx_atomic_load(__cxx_atomic_base_impl_default<_Tp, _Sco> const volatile* __a, int __order) { | ||
#ifdef __CUDA_ARCH__ | ||
return __cxx_atomic_alignment_unwrap(detail::__atomic_load_n_cuda(&__a->__a_value, __order, detail::__scope_tag<_Sco>())); | ||
#else | ||
return __cxx_atomic_alignment_unwrap(::std::atomic_load_explicit(&__a->__a_value, (::std::memory_order)__order)); | ||
alignas(_Tp) unsigned char __buf[sizeof(_Tp)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@griwes it looks like I tore off another alignment unwrap here.
Are these char*
output buffers a code smell? This was done to fix initialization warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use __atomic_load_n here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__atomic_load_n does not work for non-integral non-pointer types: https://gcc.godbolt.org/z/osrcs1
Maybe I'm mistaken in some way about its usage?
149fcee
to
264a9f2
Compare
Testing on a GV100 looks good. No failures with 1 unsupported test. I'll kick off a CI run now. I've replaced the I'm unsure if that is the correct thing to do however. |
#endif | ||
} | ||
template<class _Tp, int _Sco> | ||
__host__ __device__ inline _Tp __cxx_atomic_load(__cxx_atomic_base_impl_default<_Tp, _Sco> const volatile* __a, int __order) { | ||
#ifdef __CUDA_ARCH__ | ||
return __cxx_atomic_alignment_unwrap(detail::__atomic_load_n_cuda(&__a->__a_value, __order, detail::__scope_tag<_Sco>())); | ||
#else | ||
return __cxx_atomic_alignment_unwrap(::std::atomic_load_explicit(&__a->__a_value, (::std::memory_order)__order)); | ||
alignas(_Tp) unsigned char __buf[sizeof(_Tp)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use __atomic_load_n here instead?
#endif | ||
} | ||
template<class _Tp, int _Sco> | ||
__host__ __device__ inline _Tp* __cxx_atomic_fetch_add(__cxx_atomic_base_impl_default<_Tp*, _Sco> volatile* __a, ptrdiff_t __delta, int __order) { | ||
#ifdef __CUDA_ARCH__ | ||
return detail::__atomic_fetch_add_cuda(&__a->__a_value, __delta, __order, detail::__scope_tag<_Sco>()); | ||
#else | ||
return ::std::atomic_fetch_add_explicit(&__a->__a_value, __delta, (::std::memory_order)__order); | ||
return __atomic_fetch_add(&__a->__a_value, __delta * __skip_amt<_Tp*>::value, __order); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem right to me that we should need the skip amount in this layer. The layer below should be doing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, every deviation between the CUDA_ARCH side and this side looks like a bug to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem right to me that we should need the skip amount in this layer. The layer below should be doing that.
I'd agree, but there is no layer at the compiler intrinsic level for GCC. At that point incrementing by the sizeof(_Tp) is necessary. https://github.com/NVIDIA/libcudacxx/blob/main/libcxx/include/atomic#L846
Basically, every deviation between the CUDA_ARCH side and this side looks like a bug to me.
@griwes, @jrhemstad, and I had a meeting today about how we could resolve some of this with a better platform layering framework. There's some neat ideas on the table for making this nesting doll thing be a bit cleaner.
It would be relevant to know what things are being done wrong ahead of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good.
…/sub; make msvc sfinae a tad less repetitive
…clude to <atomic>
264a9f2
to
7d802a6
Compare
Atomic decouple builds clean on CI. SC: 29322243.2 |
No description provided.