Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Variadic tuple preparation #1310

Merged
merged 10 commits into from
Nov 5, 2020
Merged

Variadic tuple preparation #1310

merged 10 commits into from
Nov 5, 2020

Conversation

andrewcorrigan
Copy link
Contributor

Some simplifications preparing Thrust for a variadic tuple implementation (some day... NVIDIA/cccl#695). Other changes would require a bit more coordination and can come separately, assuming these sorts of changes are now mergeable.

With -DTHRUST_DEVICE_SYSTEM=CPP I get:

100% tests passed, 0 tests failed out of 151

Total Test time (real) = 108.10 sec

@alliepiper
Copy link
Collaborator

Changes LGTM. I'll start some testing soon, and I also want to check the impact of the new recursive templates on compile times.

@andrewcorrigan
Copy link
Contributor Author

andrewcorrigan commented Oct 12, 2020

Just to make sure I follow, are you just referring to or_ and and_? Perhaps we could use a fold expression if C++17 is available.

@alliepiper
Copy link
Collaborator

Those, but also others like the get_type implementation. Past projects I've worked on have benefited from doing some manual unrolling of recursive templates to keep compile times in check. Switching to C++17 folds when possible is another solution.

But I'll run some tests before we put too much work into fixing this, the overhead may be negligible in these cases.

@andrewcorrigan
Copy link
Contributor Author

Gotcha. Regarding get_type, variadic type_list isn't actually needed for variadic tuple. Initially, when I was running the tests, I thought its use of null_type was contributing to some errors I encountered, but I was mistaken. Therefore, if recursive get_type does cause any regressions I'd be fine with removing [6ff6c6d].

@alliepiper alliepiper added testing: gpuCI in progress Started gpuCI testing. testing: gpuCI passed Passed gpuCI testing. and removed testing: gpuCI in progress Started gpuCI testing. labels Oct 13, 2020
@alliepiper
Copy link
Collaborator

DVS CL 29194398.

@alliepiper alliepiper added the testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). label Oct 13, 2020
@alliepiper
Copy link
Collaborator

alliepiper commented Oct 14, 2020

Good news and bad news.

Good news: This doesn't significantly impact compile times.

Bad news: This doesn't build cleanly on our internal CI. Can you take a look at this error?

/dvs/p4/build/sw/gpgpu/thrust/thrust/iterator/detail/zip_iterator.inl(74): error: no instance of function template "thrust::detail::tuple_host_device_transform" matches the argument list
            argument types are: (const thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, thrust::detail::dereference_iterator)
          detected during:
            instantiation of "thrust::detail::zip_iterator_base<IteratorTuple>::type::reference thrust::zip_iterator<IteratorTuple>::dereference() const [with IteratorTuple=thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>]" 
/dvs/p4/build/sw/gpgpu/thrust/thrust/iterator/iterator_facade.h(128): here
            instantiation of "Facade::reference thrust::iterator_core_access::dereference(const Facade &) [with Facade=thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>>]" 
/dvs/p4/build/sw/gpgpu/thrust/thrust/iterator/iterator_facade.h(310): here
            instantiation of "thrust::iterator_facade<Derived, Value, System, Traversal, Reference, Difference>::reference thrust::iterator_facade<Derived, Value, System, Traversal, Reference, Difference>::operator*() const [with Derived=thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>>, Value=thrust::tuple<float, float, float, float, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, System=thrust::device_system_tag, Traversal=thrust::random_access_traversal_tag, Reference=thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Difference=signed long]" 
/dvs/p4/build/sw/gpgpu/thrust/thrust/iterator/iterator_facade.h(328): here
            instantiation of "thrust::iterator_facade<Derived, Value, System, Traversal, Reference, Difference>::reference thrust::iterator_facade<Derived, Value, System, Traversal, Reference, Difference>::operator[](thrust::iterator_facade<Derived, Value, System, Traversal, Reference, Difference>::difference_type) const [with Derived=thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>>, Value=thrust::tuple<float, float, float, float, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, System=thrust::device_system_tag, Traversal=thrust::random_access_traversal_tag, Reference=thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Difference=signed long]" 
/dvs/p4/build/sw/gpgpu/thrust/thrust/system/cuda/detail/for_each.h(58): here
            instantiation of "void thrust::cuda_cub::for_each_f<Input, UnaryOp>::operator()(Size) [with Input=thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>>, UnaryOp=thrust::detail::wrapped_function<arbitrary_functor1, void>, Size=long]" 
/dvs/p4/build/sw/gpgpu/thrust/thrust/system/cuda/detail/parallel_for.h(97): here
            [ 6 instantiation contexts not shown ]
            instantiation of "void thrust::cuda_cub::parallel_for(thrust::cuda_cub::execution_policy<Derived> &, F, Size) [with Derived=thrust::cuda_cub::tag, F=thrust::cuda_cub::for_each_f<thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>>, thrust::detail::wrapped_function<arbitrary_functor1, void>>, Size=signed long]" 
/dvs/p4/build/sw/gpgpu/thrust/thrust/system/cuda/detail/for_each.h(82): here
            instantiation of "Input thrust::cuda_cub::for_each_n(thrust::cuda_cub::execution_policy<Derived> &, Input, Size, UnaryOp) [with Derived=thrust::cuda_cub::tag, Input=thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>>, Size=signed long, UnaryOp=arbitrary_functor1]" 
/dvs/p4/build/sw/gpgpu/thrust/thrust/system/cuda/detail/for_each.h(104): here
            instantiation of "Input thrust::cuda_cub::for_each(thrust::cuda_cub::execution_policy<Derived> &, Input, Input, UnaryOp) [with Derived=thrust::cuda_cub::tag, Input=thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>>, UnaryOp=arbitrary_functor1]" 
/dvs/p4/build/sw/gpgpu/thrust/thrust/detail/for_each.inl(44): here
            instantiation of "InputIterator thrust::for_each(const thrust::detail::execution_policy_base<DerivedPolicy> &, InputIterator, InputIterator, UnaryFunction) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>>, UnaryFunction=arbitrary_functor1]" 
/dvs/p4/build/sw/gpgpu/thrust/thrust/detail/for_each.inl(58): here
            instantiation of "InputIterator thrust::for_each(InputIterator, InputIterator, UnaryFunction) [with InputIterator=thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::detail::normal_iterator<thrust::device_ptr<float>>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>>, UnaryFunction=arbitrary_functor1]" 
/dvs/p4/build/sw/gpgpu/thrust/examples//arbitrary_transformation.cu(83): here

@alliepiper alliepiper removed testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). testing: gpuCI passed Passed gpuCI testing. labels Oct 14, 2020
@andrewcorrigan
Copy link
Contributor Author

Sorry, I should've tested with nvcc as well. I've reproduced this and will fix it asap.

@andrewcorrigan
Copy link
Contributor Author

andrewcorrigan commented Oct 14, 2020

It's this commit [53a1d25] specifically. I suspect that this is an nvcc bug. If I put an intermediate type in called tuple_meta_transform_WAR_NVCC, it appears to handle the template specialization correctly. I don't think that this intermediate type should be necessary, however.

#ifdef __NVCC__
template<typename Tuple,
         template<typename> class UnaryMetaFunction,
         typename IndexSequence>
  struct tuple_meta_transform_WAR_NVCC;

template<typename Tuple,
         template<typename> class UnaryMetaFunction,
         size_t... Is>
  struct tuple_meta_transform_WAR_NVCC<Tuple, UnaryMetaFunction, thrust::index_sequence<Is...>>
{
  typedef thrust::tuple<
    typename UnaryMetaFunction<typename thrust::tuple_element<Is,Tuple>::type>::type...
  > type;
};

template<typename Tuple,
         template<typename> class UnaryMetaFunction>
  struct tuple_meta_transform
{
  typedef typename tuple_meta_transform_WAR_NVCC<Tuple, UnaryMetaFunction, thrust::make_index_sequence<thrust::tuple_size<Tuple>::value>>::type type;
};
#else
template<typename Tuple,
         template<typename> class UnaryMetaFunction,
         typename IndexSequence = thrust::make_index_sequence<thrust::tuple_size<Tuple>::value>>
  struct tuple_meta_transform;

template<typename Tuple,
         template<typename> class UnaryMetaFunction,
         size_t... Is>
  struct tuple_meta_transform<Tuple, UnaryMetaFunction, thrust::index_sequence<Is...>>
{
  typedef thrust::tuple<
    typename UnaryMetaFunction<typename thrust::tuple_element<Is,Tuple>::type>::type...
  > type;
};
#endif

@andrewcorrigan
Copy link
Contributor Author

andrewcorrigan commented Oct 14, 2020

What do you think the best approach would be? Guard the workaround with __NVCC__ (or some equivalent)? Or should we just use the __NVCC__-guarded version with all compilers?

@alliepiper
Copy link
Collaborator

Just be sure to leave a comment near the WAR explaining why it's needed.

@andrewcorrigan
Copy link
Contributor Author

andrewcorrigan commented Oct 14, 2020

I don't think a reproducer would be too difficult to construct.

I reproduced the error(s) with nvcc 11.0 and gcc 9.2 (on centos7). ctest is running currently, so far so good. Would you rather me commit the fix on top of the existing commits? Or should I redo [53a1d25] as a single clean commit with the workaround included from the beginning?

Also, does this look ok?

// introduce an intermediate type tuple_meta_transform_WAR_NVCC
// rather than directly specializing tuple_meta_transform with
// default argument IndexSequence = thrust::make_index_sequence<thrust::tuple_size<Tuple>::value>
// to workaround nvcc 11.0 compiler bug
template<typename Tuple,
         template<typename> class UnaryMetaFunction,
         typename IndexSequence>
  struct tuple_meta_transform_WAR_NVCC;

template<typename Tuple,
         template<typename> class UnaryMetaFunction,
         size_t... Is>
  struct tuple_meta_transform_WAR_NVCC<Tuple, UnaryMetaFunction, thrust::index_sequence<Is...>>
{
  typedef thrust::tuple<
    typename UnaryMetaFunction<typename thrust::tuple_element<Is,Tuple>::type>::type...
  > type;
};

template<typename Tuple,
         template<typename> class UnaryMetaFunction>
  struct tuple_meta_transform
{
  typedef typename tuple_meta_transform_WAR_NVCC<Tuple, UnaryMetaFunction, thrust::make_index_sequence<thrust::tuple_size<Tuple>::value>>::type type;
};

@alliepiper
Copy link
Collaborator

That looks good to me. I'll see if I can locally reproduce this on gcc 9 tomorrow.

Feel free to just push the fix to the top of the current branch. We'll need to squash this branch down to a single commit before integrating it due to some funky internal NVIDIA workflow restrictions, so there's no need to go back and clean up the history.

@andrewcorrigan
Copy link
Contributor Author

Done. Thanks

@alliepiper alliepiper added the testing: gpuCI in progress Started gpuCI testing. label Oct 15, 2020
@andrewcorrigan
Copy link
Contributor Author

I tried, but couldn't isolate to a standalone reproducer.

@alliepiper alliepiper added testing: gpuCI passed Passed gpuCI testing. and removed testing: gpuCI in progress Started gpuCI testing. labels Oct 16, 2020
@andrewcorrigan
Copy link
Contributor Author

I pushed changes from #1311.

@alliepiper alliepiper added testing: gpuCI in progress Started gpuCI testing. testing: gpuCI passed Passed gpuCI testing. and removed testing: gpuCI passed Passed gpuCI testing. testing: gpuCI in progress Started gpuCI testing. labels Oct 16, 2020
@alliepiper
Copy link
Collaborator

DVS CL: 29265481

@GPUtester
Copy link
Collaborator

Can one of the admins verify this patch?

@alliepiper
Copy link
Collaborator

ok to test

@alliepiper alliepiper added the testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). label Oct 30, 2020
@alliepiper alliepiper self-assigned this Oct 30, 2020
@alliepiper alliepiper added testing: internal ci passed Passed internal NVIDIA CI (DVS). and removed testing: internal ci in progress Currently testing on internal NVIDIA CI (DVS). labels Nov 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
testing: gpuCI passed Passed gpuCI testing. testing: internal ci passed Passed internal NVIDIA CI (DVS).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants