Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

NVBug 2341455: reduce fails to compile with complex in CUDA 9.2 #928

Closed
4rzael opened this issue Aug 22, 2018 · 9 comments
Closed

NVBug 2341455: reduce fails to compile with complex in CUDA 9.2 #928

4rzael opened this issue Aug 22, 2018 · 9 comments
Labels
nvbug Has an associated internal NVIDIA NVBug. P1: should have Necessary, but not critical. type: bug: functional Does not work as intended.

Comments

@4rzael
Copy link

4rzael commented Aug 22, 2018

Hello,

I have been trying to implement some code requiring to call reduce on thrust::complexes, and the compiler fires me an error saying:

cannot pass an argument with a user-provided copy-constructor to a device-side kernel launch

Here is the code:

#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/complex.h>
#include <thrust/transform.h>
#include <vector>

using namespace thrust;

void exec() {
    auto v = std::vector<complex<double>>({1.0,1.0,1.0,1.0});
    auto complexZero = complex<double>();
    device_vector<complex<double>> devA(v);
    thrust::reduce(devA.begin(), devA.end(), complexZero, plus<complex<double>>());    
}
int main() {
    exec();
}

And the compilation line: nvcc test.cu -std=c++14 -arch=sm_61 --device-c

  • Environment:
    Cuda 9.2, ubuntu 18.04, nvcc host compiler: gcc
  • Expected output:
    No compilation error
  • Observed output:
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/cub/device/dispatch/dispatch_reduce.cuh(446): error: cannot pass an argument with a user-provided copy-constructor to a device-side kernel launch
          detected during:
            instantiation of "cudaError_t thrust::cuda_cub::cub::DispatchReduce<InputIteratorT, OutputIteratorT, OffsetT, ReductionOpT, OutputT>::InvokeSingleTile<ActivePolicyT,SingleTileKernelT>(SingleTileKernelT) [with InputIteratorT=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, OutputIteratorT=thrust::complex<double> *, OffsetT=int, ReductionOpT=thrust::plus<thrust::complex<double>>, OutputT=thrust::complex<double>, ActivePolicyT=thrust::cuda_cub::cub::DeviceReducePolicy<thrust::complex<double>, int, thrust::plus<thrust::complex<double>>>::Policy130, SingleTileKernelT=void (*)(thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, thrust::complex<double> *, int, thrust::plus<thrust::complex<double>>, thrust::complex<double>)]" 
(599): here
            instantiation of "cudaError_t thrust::cuda_cub::cub::DispatchReduce<InputIteratorT, OutputIteratorT, OffsetT, ReductionOpT, OutputT>::Invoke<ActivePolicyT>() [with InputIteratorT=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, OutputIteratorT=thrust::complex<double> *, OffsetT=int, ReductionOpT=thrust::plus<thrust::complex<double>>, OutputT=thrust::complex<double>, ActivePolicyT=thrust::cuda_cub::cub::DeviceReducePolicy<thrust::complex<double>, int, thrust::plus<thrust::complex<double>>>::Policy130]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/cub/block/../iterator/../util_device.cuh(332): here
            instantiation of "cudaError_t thrust::cuda_cub::cub::ChainedPolicy<PTX_VERSION, PolicyT, PolicyT>::Invoke(int, FunctorT &) [with PTX_VERSION=130, PolicyT=thrust::cuda_cub::cub::DeviceReducePolicy<thrust::complex<double>, int, thrust::plus<thrust::complex<double>>>::Policy130, FunctorT=thrust::cuda_cub::cub::DispatchReduce<thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, thrust::complex<double> *, int, thrust::plus<thrust::complex<double>>, thrust::complex<double>>]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/cub/block/../iterator/../util_device.cuh(315): here
            instantiation of "cudaError_t thrust::cuda_cub::cub::ChainedPolicy<PTX_VERSION, PolicyT, PrevPolicyT>::Invoke(int, FunctorT &) [with PTX_VERSION=200, PolicyT=thrust::cuda_cub::cub::DeviceReducePolicy<thrust::complex<double>, int, thrust::plus<thrust::complex<double>>>::Policy200, PrevPolicyT=thrust::cuda_cub::cub::DeviceReducePolicy<thrust::complex<double>, int, thrust::plus<thrust::complex<double>>>::Policy130, FunctorT=thrust::cuda_cub::cub::DispatchReduce<thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, thrust::complex<double> *, int, thrust::plus<thrust::complex<double>>, thrust::complex<double>>]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/cub/block/../iterator/../util_device.cuh(315): here
            instantiation of "cudaError_t thrust::cuda_cub::cub::ChainedPolicy<PTX_VERSION, PolicyT, PrevPolicyT>::Invoke(int, FunctorT &) [with PTX_VERSION=300, PolicyT=thrust::cuda_cub::cub::DeviceReducePolicy<thrust::complex<double>, int, thrust::plus<thrust::complex<double>>>::Policy300, PrevPolicyT=thrust::cuda_cub::cub::DeviceReducePolicy<thrust::complex<double>, int, thrust::plus<thrust::complex<double>>>::Policy200, FunctorT=thrust::cuda_cub::cub::DispatchReduce<thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, thrust::complex<double> *, int, thrust::plus<thrust::complex<double>>, thrust::complex<double>>]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/cub/block/../iterator/../util_device.cuh(315): here
            [ 3 instantiation contexts not shown ]
            instantiation of "cudaError_t thrust::cuda_cub::cub::DeviceReduce::Reduce(void *, size_t &, InputIteratorT, OutputIteratorT, int, ReductionOpT, T, cudaStream_t, __nv_bool) [with InputIteratorT=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, OutputIteratorT=thrust::complex<double> *, ReductionOpT=thrust::plus<thrust::complex<double>>, T=thrust::complex<double>]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/reduce.h(950): here
            instantiation of "T thrust::cuda_cub::reduce_n(thrust::cuda_cub::execution_policy<Derived> &, InputIt, Size, T, BinaryOp) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, Size=std::ptrdiff_t, T=thrust::complex<double>, BinaryOp=thrust::plus<thrust::complex<double>>]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/reduce.h(1010): here
            instantiation of "T thrust::cuda_cub::reduce(thrust::cuda_cub::execution_policy<Derived> &, InputIt, InputIt, T, BinaryOp) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, T=thrust::complex<double>, BinaryOp=thrust::plus<thrust::complex<double>>]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/detail/reduce.inl(71): here
            instantiation of "T thrust::reduce(const thrust::detail::execution_policy_base<DerivedPolicy> &, InputIterator, InputIterator, T, BinaryFunction) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, T=thrust::complex<double>, BinaryFunction=thrust::plus<thrust::complex<double>>]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/detail/reduce.inl(186): here
            instantiation of "T thrust::reduce(InputIterator, InputIterator, T, BinaryFunction) [with InputIterator=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, T=thrust::complex<double>, BinaryFunction=thrust::plus<thrust::complex<double>>]" 
test.cu(13): here

/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/detail/complex/complex.inl(187): error: no instance of overloaded function "thrust::complex<T>::real [with T=double]" matches the argument list
            argument types are: (const thrust::device_reference<thrust::complex<double>>)
          detected during:
            instantiation of "thrust::complex<T> &thrust::complex<T>::operator=(const R &) [with T=double, R=thrust::device_reference<thrust::complex<double>>]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/cub/device/dispatch/../../agent/agent_reduce.cuh(290): here
            instantiation of "void thrust::cuda_cub::cub::AgentReduce<AgentReducePolicy, InputIteratorT, OutputIteratorT, OffsetT, ReductionOp>::ConsumeTile<IS_FIRST_TILE,CAN_VECTORIZE>(thrust::cuda_cub::cub::AgentReduce<AgentReducePolicy, InputIteratorT, OutputIteratorT, OffsetT, ReductionOp>::OutputT &, OffsetT, int, thrust::cuda_cub::cub::Int2Type<0>, thrust::cuda_cub::cub::Int2Type<CAN_VECTORIZE>) [with AgentReducePolicy=thrust::cuda_cub::cub::AgentReducePolicy<64, 16, 4, thrust::cuda_cub::cub::BLOCK_REDUCE_WARP_REDUCTIONS, thrust::cuda_cub::cub::LOAD_LDG>, InputIteratorT=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, OutputIteratorT=thrust::complex<double> *, OffsetT=int, ReductionOp=thrust::plus<thrust::complex<double>>, IS_FIRST_TILE=1, CAN_VECTORIZE=0]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/cub/device/dispatch/../../agent/agent_reduce.cuh(322): here
            instantiation of "thrust::cuda_cub::cub::AgentReduce<AgentReducePolicy, InputIteratorT, OutputIteratorT, OffsetT, ReductionOp>::OutputT thrust::cuda_cub::cub::AgentReduce<AgentReducePolicy, InputIteratorT, OutputIteratorT, OffsetT, ReductionOp>::ConsumeRange(thrust::cuda_cub::cub::GridEvenShare<OffsetT> &, thrust::cuda_cub::cub::Int2Type<CAN_VECTORIZE>) [with AgentReducePolicy=thrust::cuda_cub::cub::AgentReducePolicy<64, 16, 4, thrust::cuda_cub::cub::BLOCK_REDUCE_WARP_REDUCTIONS, thrust::cuda_cub::cub::LOAD_LDG>, InputIteratorT=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, OutputIteratorT=thrust::complex<double> *, OffsetT=int, ReductionOp=thrust::plus<thrust::complex<double>>, CAN_VECTORIZE=0]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/cub/device/dispatch/../../agent/agent_reduce.cuh(360): here
            instantiation of "thrust::cuda_cub::cub::AgentReduce<AgentReducePolicy, InputIteratorT, OutputIteratorT, OffsetT, ReductionOp>::OutputT thrust::cuda_cub::cub::AgentReduce<AgentReducePolicy, InputIteratorT, OutputIteratorT, OffsetT, ReductionOp>::ConsumeRange(OffsetT, OffsetT) [with AgentReducePolicy=thrust::cuda_cub::cub::AgentReducePolicy<64, 16, 4, thrust::cuda_cub::cub::BLOCK_REDUCE_WARP_REDUCTIONS, thrust::cuda_cub::cub::LOAD_LDG>, InputIteratorT=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, OutputIteratorT=thrust::complex<double> *, OffsetT=int, ReductionOp=thrust::plus<thrust::complex<double>>]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/cub/device/dispatch/dispatch_reduce.cuh(143): here
            instantiation of "void thrust::cuda_cub::cub::DeviceReduceSingleTileKernel<ChainedPolicyT,InputIteratorT,OutputIteratorT,OffsetT,ReductionOpT,OutputT>(InputIteratorT, OutputIteratorT, OffsetT, ReductionOpT, OutputT) [with ChainedPolicyT=thrust::cuda_cub::cub::DeviceReducePolicy<thrust::complex<double>, int, thrust::plus<thrust::complex<double>>>::Policy600, InputIteratorT=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, OutputIteratorT=thrust::complex<double> *, OffsetT=int, ReductionOpT=thrust::plus<thrust::complex<double>>, OutputT=thrust::complex<double>]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/cub/device/dispatch/dispatch_reduce.cuh(599): here
            [ 7 instantiation contexts not shown ]
            instantiation of "cudaError_t thrust::cuda_cub::cub::DeviceReduce::Reduce(void *, size_t &, InputIteratorT, OutputIteratorT, int, ReductionOpT, T, cudaStream_t, __nv_bool) [with InputIteratorT=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, OutputIteratorT=thrust::complex<double> *, ReductionOpT=thrust::plus<thrust::complex<double>>, T=thrust::complex<double>]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/reduce.h(950): here
            instantiation of "T thrust::cuda_cub::reduce_n(thrust::cuda_cub::execution_policy<Derived> &, InputIt, Size, T, BinaryOp) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, Size=std::ptrdiff_t, T=thrust::complex<double>, BinaryOp=thrust::plus<thrust::complex<double>>]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/reduce.h(1010): here
            instantiation of "T thrust::cuda_cub::reduce(thrust::cuda_cub::execution_policy<Derived> &, InputIt, InputIt, T, BinaryOp) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, T=thrust::complex<double>, BinaryOp=thrust::plus<thrust::complex<double>>]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/detail/reduce.inl(71): here
            instantiation of "T thrust::reduce(const thrust::detail::execution_policy_base<DerivedPolicy> &, InputIterator, InputIterator, T, BinaryFunction) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, T=thrust::complex<double>, BinaryFunction=thrust::plus<thrust::complex<double>>]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/detail/reduce.inl(186): here
            instantiation of "T thrust::reduce(InputIterator, InputIterator, T, BinaryFunction) [with InputIterator=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, T=thrust::complex<double>, BinaryFunction=thrust::plus<thrust::complex<double>>]" 
test.cu(13): here

/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/detail/complex/complex.inl(64): error: no suitable conversion function from "const thrust::device_reference<thrust::complex<double>>" to "double" exists
          detected during:
            instantiation of "thrust::complex<T>::complex(const R &) [with T=double, R=thrust::device_reference<thrust::complex<double>>]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/cub/device/dispatch/../../agent/agent_reduce.cuh(297): here
            instantiation of "void thrust::cuda_cub::cub::AgentReduce<AgentReducePolicy, InputIteratorT, OutputIteratorT, OffsetT, ReductionOp>::ConsumeTile<IS_FIRST_TILE,CAN_VECTORIZE>(thrust::cuda_cub::cub::AgentReduce<AgentReducePolicy, InputIteratorT, OutputIteratorT, OffsetT, ReductionOp>::OutputT &, OffsetT, int, thrust::cuda_cub::cub::Int2Type<0>, thrust::cuda_cub::cub::Int2Type<CAN_VECTORIZE>) [with AgentReducePolicy=thrust::cuda_cub::cub::AgentReducePolicy<64, 16, 4, thrust::cuda_cub::cub::BLOCK_REDUCE_WARP_REDUCTIONS, thrust::cuda_cub::cub::LOAD_LDG>, InputIteratorT=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, OutputIteratorT=thrust::complex<double> *, OffsetT=int, ReductionOp=thrust::plus<thrust::complex<double>>, IS_FIRST_TILE=1, CAN_VECTORIZE=0]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/cub/device/dispatch/../../agent/agent_reduce.cuh(322): here
            instantiation of "thrust::cuda_cub::cub::AgentReduce<AgentReducePolicy, InputIteratorT, OutputIteratorT, OffsetT, ReductionOp>::OutputT thrust::cuda_cub::cub::AgentReduce<AgentReducePolicy, InputIteratorT, OutputIteratorT, OffsetT, ReductionOp>::ConsumeRange(thrust::cuda_cub::cub::GridEvenShare<OffsetT> &, thrust::cuda_cub::cub::Int2Type<CAN_VECTORIZE>) [with AgentReducePolicy=thrust::cuda_cub::cub::AgentReducePolicy<64, 16, 4, thrust::cuda_cub::cub::BLOCK_REDUCE_WARP_REDUCTIONS, thrust::cuda_cub::cub::LOAD_LDG>, InputIteratorT=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, OutputIteratorT=thrust::complex<double> *, OffsetT=int, ReductionOp=thrust::plus<thrust::complex<double>>, CAN_VECTORIZE=0]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/cub/device/dispatch/../../agent/agent_reduce.cuh(360): here
            instantiation of "thrust::cuda_cub::cub::AgentReduce<AgentReducePolicy, InputIteratorT, OutputIteratorT, OffsetT, ReductionOp>::OutputT thrust::cuda_cub::cub::AgentReduce<AgentReducePolicy, InputIteratorT, OutputIteratorT, OffsetT, ReductionOp>::ConsumeRange(OffsetT, OffsetT) [with AgentReducePolicy=thrust::cuda_cub::cub::AgentReducePolicy<64, 16, 4, thrust::cuda_cub::cub::BLOCK_REDUCE_WARP_REDUCTIONS, thrust::cuda_cub::cub::LOAD_LDG>, InputIteratorT=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, OutputIteratorT=thrust::complex<double> *, OffsetT=int, ReductionOp=thrust::plus<thrust::complex<double>>]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/cub/device/dispatch/dispatch_reduce.cuh(143): here
            instantiation of "void thrust::cuda_cub::cub::DeviceReduceSingleTileKernel<ChainedPolicyT,InputIteratorT,OutputIteratorT,OffsetT,ReductionOpT,OutputT>(InputIteratorT, OutputIteratorT, OffsetT, ReductionOpT, OutputT) [with ChainedPolicyT=thrust::cuda_cub::cub::DeviceReducePolicy<thrust::complex<double>, int, thrust::plus<thrust::complex<double>>>::Policy600, InputIteratorT=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, OutputIteratorT=thrust::complex<double> *, OffsetT=int, ReductionOpT=thrust::plus<thrust::complex<double>>, OutputT=thrust::complex<double>]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/cub/device/dispatch/dispatch_reduce.cuh(599): here
            [ 7 instantiation contexts not shown ]
            instantiation of "cudaError_t thrust::cuda_cub::cub::DeviceReduce::Reduce(void *, size_t &, InputIteratorT, OutputIteratorT, int, ReductionOpT, T, cudaStream_t, __nv_bool) [with InputIteratorT=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, OutputIteratorT=thrust::complex<double> *, ReductionOpT=thrust::plus<thrust::complex<double>>, T=thrust::complex<double>]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/reduce.h(950): here
            instantiation of "T thrust::cuda_cub::reduce_n(thrust::cuda_cub::execution_policy<Derived> &, InputIt, Size, T, BinaryOp) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, Size=std::ptrdiff_t, T=thrust::complex<double>, BinaryOp=thrust::plus<thrust::complex<double>>]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/reduce.h(1010): here
            instantiation of "T thrust::cuda_cub::reduce(thrust::cuda_cub::execution_policy<Derived> &, InputIt, InputIt, T, BinaryOp) [with Derived=thrust::cuda_cub::tag, InputIt=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, T=thrust::complex<double>, BinaryOp=thrust::plus<thrust::complex<double>>]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/detail/reduce.inl(71): here
            instantiation of "T thrust::reduce(const thrust::detail::execution_policy_base<DerivedPolicy> &, InputIterator, InputIterator, T, BinaryFunction) [with DerivedPolicy=thrust::cuda_cub::tag, InputIterator=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, T=thrust::complex<double>, BinaryFunction=thrust::plus<thrust::complex<double>>]" 
/usr/local/cuda-9.2/bin/../targets/x86_64-linux/include/thrust/detail/reduce.inl(186): here
            instantiation of "T thrust::reduce(InputIterator, InputIterator, T, BinaryFunction) [with InputIterator=thrust::detail::normal_iterator<thrust::device_ptr<thrust::complex<double>>>, T=thrust::complex<double>, BinaryFunction=thrust::plus<thrust::complex<double>>]" 
test.cu(13): here

3 errors detected in the compilation of "/tmp/tmpxft_000070ef_00000000-6_test.cpp1.ii".

The transform compiles and run perfectly, however.
Am I doing something wrong ?

@brycelelbach
Copy link
Collaborator

Hi, there's not enough information in this bug for it to be actionable. Please provide a minimal and self contained test case.

See also: https://github.com/brycelelbach/cpp_bug_reporting_guidelines

@4rzael
Copy link
Author

4rzael commented Aug 22, 2018

Sorry about that. Is this ok now ?

@FilipeMaia
Copy link
Contributor

Pretty sure this is the same as #919

@brycelelbach brycelelbach changed the title reduce with thrust vectors: error: cannot pass an argument with a user-provided copy-constructor to a device-side kernel launch NVBug 2341455: reduce fails to compile with complex in CUDA 9.2 Aug 25, 2018
@brycelelbach brycelelbach added this to the Next Next Release milestone Aug 25, 2018
@brycelelbach brycelelbach added type: bug: functional Does not work as intended. unverified Cannot be reproduced or confirmed. triage Needs investigation and classification. nvbug Has an associated internal NVIDIA NVBug. labels Aug 25, 2018
@brycelelbach
Copy link
Collaborator

@4rzael: yep.

@FilipeMaia I think it is indeed a duplicate.

@griwes
Copy link
Collaborator

griwes commented Dec 3, 2018

@4rzael Can you please check if what you are trying to do works on the version of Thrust shipped with CUDA 10.0?

@brycelelbach brycelelbach removed triage Needs investigation and classification. unverified Cannot be reproduced or confirmed. labels Dec 4, 2018
@brycelelbach
Copy link
Collaborator

Fixed as of CUDA 10.0. Please re-open if it's still an issue.

@jjungsprtn
Copy link

Seeing this issue in CUDA 10.1

@ianrgraham
Copy link

I also see this in CUDA 10.1, it only appears when I set -arch to anything other than sm_30 when compiling my stuff.

@luiset83
Copy link

luiset83 commented Jan 13, 2020

This is still an issue when cub or thrust code is compiled in a separate library in CUDA 10.1 or CUDA 10.2. Attached is a minimal self reproducible example for Linux. (Requires cmake 3.10 and CUDA 10.1+)

As @ianrgraham commented, this happens when -arch is set to anything other than the default. After extracting the attached, the cub_bug.sh script will use cmake to build/compile the code with
-gencode=arch=compute_70,code=sm_70, first without the bug (no call to reduce for thrust::complex), and afterwards with the reduce call present via an ifdef, generating the corresponding error messages.

Note that there is a standalone thrust::complex reduction done in the exec() function in the test_cuda.cu file that is NOT affected, since it is not part of the library.

cub_thrust_compile_bug.zip

Removing the -arch statement will permit the compilation of the reduction of thrust::complex from the library, but this is not desirable.

kwrobot pushed a commit to Kitware/VTK-m that referenced this issue Jul 9, 2020
When you try to call the `Reduce` operation in the CUDA device adapter
with a sufficently complex interator type, you get a compile error
that says `error: cannot pass an argument with a user-provided
copy-constructor to a device-side kernel launch`.

This appears to be a bug in either nvcc or Thrust. I believe it is
related to the following reported issues:

* NVIDIA/thrust#928
* NVIDIA/thrust#1044

Work around this problem by making a special condition for calling
`Reduce` with an `ArrayHandleMultiplexer` that calls the generic
algorithm in `DeviceAdapterAlgorithmGeneral` instead of the algorithm in
Thrust.
@alliepiper alliepiper added the P1: should have Necessary, but not critical. label Aug 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
nvbug Has an associated internal NVIDIA NVBug. P1: should have Necessary, but not critical. type: bug: functional Does not work as intended.
Projects
None yet
Development

No branches or pull requests

8 participants