-
Notifications
You must be signed in to change notification settings - Fork 187
Conversation
include/cuda/std/detail/libcxx/include/support/atomic/atomic_cuda_generated.h
Outdated
Show resolved
Hide resolved
I'm overall happy with these changes. I'll start up CI, and since Windows is not having issues I don't expect issues elsewhere either. |
@@ -1239,7 +1239,7 @@ _LIBCUDACXX_INLINE_VISIBILITY void __cxx_atomic_wait(__cxx_atomic_impl<_Tp, _Sco | |||
} | |||
|
|||
// general atomic<T>/atomic_ref<T> | |||
template <class _Tp, int _Sco = 0, bool = is_integral<_Tp>::value && !is_same<_Tp, bool>::value> | |||
template <class _Tp, int _Sco = 0, bool = (is_integral<_Tp>::value || is_floating_point<_Tp>::value) && !is_same<_Tp, bool>::value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we are enabling the arithmetic operators in cuda::std::atomics
as well we would need to extend the tests to exercise floating point.
These tests live in tests/std/atomics/
rather than tests/cuda/atomics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.upstream-tests/test/std/atomics/atomics.types.operations/atomics.types.operations.arith
is this the right directory to extend the fp arithmetic tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I figure I need to add some more tests under the std/atomics
directory. Basically I need to mirror all tests over integral types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the integral tests would be a good start. Replicating the tests and doing the dispatch yourself for float/double would probably be sufficient. I don't think figuring out how to fit bitwise/arithmetic dispatches into the atomics_helpers there would be worth the effort.
Whups, while writing some more tests, I just stumbled over the fp min/max problem, i.e. not having specific instructions for them. I'm gonna adjust |
Hmm, yeah, I'll take a second pass over this component when I make fixes for the current issues as well. There's some overlap here with another problem. #279 |
After extending the tests, I get some weird errors that I am yet unable to track down:
@wmaxey do you know what this could be? |
I am not encountering this on your latest, let me try with some other configs. Maybe some changes I had made on top of your branch fixed this. |
For reference: I am running CUDA CTK 11.7.0 with gcc 11.2 in an ubuntu20.04 container. I only have a Pascal card available (sm_61). |
We'll need to break out float add/sub into CAS loops. On MSVC we brutally cast to long*/int* which causes invalid results. I can make these changes on top of your changes. |
I've added a patch that fixes a few issues, let me know if this resolves your problems as well. |
Ah, I didn't think of the host side. Good catch!
Thanks a lot! This fixes most of the previously failing tests.
The last four tests that still fail throw the exact same error: |
That's a known issue for Pascal. I do not know the cause, but believe it may have something to do with an unsupported size operand. I think the only remaining thing would be tests similar to |
/done Had to break them out into separate files although this introduces a lot of duplicate code. |
Thanks for all the effort Daniel! I'll see if there's any issues with the changes again and if not I think it's okay to merge. Though I might remove some comments that documented the 'unsigned' nature of min/max. |
Would be nice to get @griwes to review this as well. |
https://builds4u.nvidia.com/dvs/#/change/3154615863024800.1?eventType=Virtual&dvs_showStaging=on DVS is clean, but SC-DVS found ICEs that I am able to repro on VC129, will try to figure out what's going on there. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this looks good to me; most comments I have are in tests, so I'm going to give this a 👍 and leave it up to @wmaxey as to whether these should gate landing this PR or not. The one comment I have in the actual change is rather minor.
@@ -60,14 +60,58 @@ struct TestFn { | |||
} | |||
}; | |||
|
|||
template <template<typename, typename> typename Selector, cuda::thread_scope ThreadScope> | |||
struct TestFn<int, Selector, ThreadScope> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This specializes specifically for int
- shouldn't it, instead, be specialized for any signed integral type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, I'm attempting to get a guarantee that signed math is working as expected. It would be completely fair to split it into unsigned and signed specializations. Perhaps more tests for this API are needed. ;)
@@ -60,14 +60,59 @@ struct TestFn { | |||
} | |||
}; | |||
|
|||
template <template<typename, typename> typename Selector, cuda::thread_scope ThreadScope> | |||
struct TestFn<int, Selector, ThreadScope> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the comment above.
__host__ __device__ | ||
void operator()() const { | ||
TestFunctor<float, Selector, Scope>()(); | ||
TestFunctor<double, Selector, Scope>()(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also have a host-only call to TestFunctor<long double, Selector, Scope>()()
here?
.upstream-tests/test/std/atomics/atomics.types.generic/floating_point.pass.cpp
Show resolved
Hide resolved
|
||
int main(int, char**) | ||
{ | ||
// this test would instantiate more cases than just the ones below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integral tests here instantiate the test functions for all integer types. Here we only instantiate for two floating point types. It should be fine to remove this comment and have all combinations of scopes and memory selectors actually tested below.
|
||
int main(int, char**) | ||
{ | ||
// this test would instantiate more cases than just the ones below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as in the non-ref version of this test.
|
||
int main(int, char**) | ||
{ | ||
// this test would instantiate more cases than just the ones below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And once again here.
__host__ __device__ | ||
void operator()() const { | ||
TestFunctor<float, Selector, Scope>()(); | ||
TestFunctor<double, Selector, Scope>()(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as earlier - on the host here should also be a call for long double
.
__host__ __device__ | ||
void operator()() const { | ||
TestFunctor<float, Selector, Scope>()(); | ||
TestFunctor<double, Selector, Scope>()(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And one more time here.
typedef atomic<float> atomic_float; | ||
typedef atomic<double> atomic_double; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these two are in the standard. The other ones are there mainly for C interop; do we want these two here? If we do, we should also have a (possibly host-only) atomic_long_double
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to say we probably don't want these. We'll need to open the door eventually for a half and just diverge further with what's available on H/D.
Selector<A, constructor_initializer> sel; | ||
A & t = *sel.construct(); | ||
t = int(-1); | ||
assert(t.fetch_max(4) == int(-1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not connect with the comment.
We are still testing a smaller int
versus a larger threshold. Why is this changed and could we update the comment?
If this does some horrible conversion to unsigned magic we should test that explicitly and keep the basic 3 vs 2 test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no horrible conversion, it's just specifically testing int types. The cast is, in truth, unnecessary.
Selector<volatile A, constructor_initializer> sel; | ||
volatile A & t = *sel.construct(); | ||
t = int(5); | ||
assert(t.fetch_min(-1) == int(5)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is changing the values of the test, when we actually want to only add volatile. I would either keep them the same of change them consistently throughout the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did want to change the test in this case, but there is greater value in making a proper unsigned/signed split test so that we can guarantee that add/sub/max/min are behaving correctly.
.upstream-tests/test/std/atomics/atomics.types.generic/floating_point.pass.cpp
Show resolved
Hide resolved
assert(obj == T(1)); | ||
assert(obj.load() == T(1)); | ||
assert(obj.load(cuda::std::memory_order_acquire) == T(1)); | ||
assert(obj.exchange(T(2)) == T(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is the libcxx style, but could we at least add some newlines between the different functionality that is tested.
Reading a gazillion consecutive lines drains a lot of brain power
…appening on MSVC that seems to cause on internal compiler error
template <typename _Tp, int _Sco, | ||
typename _Base = typename conditional<__cxx_is_always_lock_free<_Tp>::__value, | ||
template <typename _Tp, int _Sco> | ||
struct __cxx_atomic_impl_conditional { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something about using
is cursed. This is very reminiscent of several tuple
fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DVS results will be posted soon, but this fixed builds on all the compilers I was able to get repros on.
@griwes I've made the tests more straightforward, all signed types now just get extra testing and there isn't some strange int only overload. @sleeepyjack I'd like to see the |
…y that dispatches to the correct partial overload.
…onally include the correct base class
Latest changes made sure that bitwise types are a superset of arithmetic types. I did some work to refactor the ref/non-ref classes as well, they should be easier to maintain in the future. |
__atomic_base_storage(_Storage&& __a) _NOEXCEPT : __a_(forward<_Storage>(__a)) {} | ||
}; | ||
|
||
template <class _Tp, bool _Cq, typename _Storage> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get a more descriptive name instead of _Cq
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll push an update with _Cq
->_ConstQualified
|
||
_LIBCUDACXX_INLINE_VISIBILITY _LIBCUDACXX_CONSTEXPR | ||
__atomic_base_ref(_Tp& __a) _NOEXCEPT : __a_(__a) {} | ||
__atomic_base_core(_Storage&& __a) _NOEXCEPT : __atomic_base_storage<_Tp, _Storage>(forward<_Storage>(__a)) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically we are required to qualify forward
as it is a non ugly function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
__atomic_base(const __atomic_base&) = delete; | ||
__atomic_base(__atomic_base&&) = delete; | ||
__atomic_base_storage() = default; | ||
__atomic_base_storage(const __atomic_base_storage&) = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am having some troubles correctly parsing the difference between the various classes. AFAICT the only difference is whether the special member functions are deleted / defaulted.
Given that the implementation of the classes is considerable, would it make sense to just derive from a single base class that does this for us, like optional does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to all the __atomic_base
classes? The main difference is const qualifiers. atomic_ref
has to allow value updates through const.
Wooohoo, first contribution merged ☑️ |
This PR is a draft to add support for float/double atomics.
Please review and let me know what is missing.
Unfortunately, the diff between the old and new codegen output is a mess due to the reordering of operations.
Also rolls back #282 and fixes #279