-
Notifications
You must be signed in to change notification settings - Fork 757
Unable to use transform_output_iterator
for output of copy_if
with CUDA
#1650
Comments
I just ran into the same issue (deleted assignment operator of |
This seems to be fixed on main, @fkallen, @pauleonix could you verify? |
@senior-zero The default constructor was added recently, but as far as I see, the assignment operator was not added. I still get the same compilation error about it being deleted. |
Hello @pauleonix! Original reproducer compiles without issues on main. Could you please provide a reproducer for your issue with assignment operator? |
The code is #include <random>
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/scan.h>
#include <thrust/zip_function.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/iterator/transform_output_iterator.h>
void foo(thrust::device_vector<float> const &input,
thrust::device_vector<float> &output,
float threshold,
int interval_size) {
auto in_iter = thrust::make_zip_iterator(thrust::make_tuple(
thrust::make_counting_iterator(0),
thrust::make_transform_iterator(
input.cbegin(),
[threshold, interval_size]
__device__ (float in) -> int {
return in > threshold ? interval_size : 0;
})));
auto out_iter = thrust::make_transform_output_iterator(
output.begin(),
thrust::make_zip_function(
[threshold, interval_size]
__device__ (int, int scan_result) {
return scan_result == interval_size ? threshold : 0.f;
}));
thrust::inclusive_scan(in_iter, in_iter + input.size(),
out_iter,
[] __device__ (thrust::tuple<int, int> const left,
thrust::tuple<int, int> const right) {
auto const distance = thrust::get<0>(right) - thrust::get<0>(left);
return thrust::make_tuple(
thrust::get<0>(right),
(thrust::get<1>(left) > distance) ? (thrust::get<1>(left) - distance)
: thrust::get<1>(right));
});
}
thrust::host_vector<float> generate_data(int size, float threshold) {
thrust::host_vector<float> data(size);
std::default_random_engine rng(123456789);
std::uniform_real_distribution<float> real_dist(0.0f, 1.1f * threshold);
for (float &val : data) {
val = real_dist(rng);
}
return data;
}
int main() {
constexpr int N = 1 << 20;
constexpr int interval_size = 42;
constexpr float threshold = 42.f;
auto data = generate_data(N, threshold);
thrust::device_vector<float> d_data(data);
thrust::device_vector<float> d_out(N);
foo(d_data, d_out, threshold, interval_size);
thrust::host_vector<int> out(d_out);
} I get
and I freshly cloned Thrust after you asked me to check main. |
It seems that this is rather an issue with device lambdas than Diggin a bit deeper the actual issue is in You can see that here, where I replaced the device lambda with a struct and device call operator https://cuda.godbolt.org/z/5cE6hcrG9 |
Yes, I can confirm that the device lambda is the problem. |
As far as I know extended lambdas are generally not supported @senior-zero |
Huh, that would be news to me, I always thought Thrust and device lambdas fit together like bread and butter. Although I think I previously observed that the Thrust examples don't use them. Even the |
@senior-zero Thanks, that is interesting. But if I understand it right, the described issues only appear when using The issue here is not fixed by adding
Either way, instead of creating a new issue I should just mention this under #779, I guess. Then you can close this one. |
Indeed. The problem is with the fundamental restrictions on extended lambdas and there isn't much Thrust can do about it. We are trying to make some minor improvements to at least detect the situations we know where device lambdas will fail and emit a more useful diagnostic (like #1688). You've reminded me that we should give We have also added things like All of us wish extended lambdas could work better with Thrust 😞. |
I filed NVIDIA/cccl#1004 in libcu++ to update the other traits that are known to be broken with extended lambdas. |
I am closing this issue as this is an intrinsic issue with NVCC and libcu++ can now provide feedback when device lambdas are used this way. |
Consider the following code which tries to use a transform_output_iterator to duplicate the results of copy_if.
The device version does not compile because of a deleted assignment operator. https://cuda.godbolt.org/z/zd5ajWYsT
The text was updated successfully, but these errors were encountered: