Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Unable to use transform_output_iterator for output of copy_if with CUDA #1650

Closed
fkallen opened this issue Mar 28, 2022 · 14 comments
Closed
Assignees
Labels
P1: should have Necessary, but not critical. thrust type: bug: functional Does not work as intended.

Comments

@fkallen
Copy link
Contributor

fkallen commented Mar 28, 2022

Consider the following code which tries to use a transform_output_iterator to duplicate the results of copy_if.

#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/copy.h>
#include <thrust/iterator/transform_output_iterator.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/fill.h>

#include <cassert>
#include <iostream>

int main(){
    thrust::host_vector<int> h_output1(10, 42);
    thrust::host_vector<int> h_output2(10, 42);

    auto h_zippedOutput = thrust::make_zip_iterator(thrust::make_tuple(
        h_output1.begin(),
        h_output2.begin()
    ));

    auto duplicateValue = [] __host__ __device__ (int i){
        return thrust::make_tuple(i,i);
    };

    auto h_outputiter = thrust::make_transform_output_iterator(
        h_zippedOutput,
        duplicateValue
    );

    thrust::fill(h_outputiter, h_outputiter + 10, 0);

    for(int i = 0; i < 10; i++){
        assert(h_output1[i] == 0);
        assert(h_output2[i] == 0);
    }

    thrust::host_vector<int> h_data(10, 5);
    thrust::host_vector<int> h_stencil(10, 0);
    h_stencil[0] = 1;
    h_stencil[7] = 1;

    auto h_outputiterend = thrust::copy_if(
        h_data.data(),
        h_data.data() + 10,
        h_stencil.data(),
        h_outputiter,
        [] __host__ __device__ (int flag){
            return flag > 0;
        }
    );

    assert(thrust::distance(h_outputiter, h_outputiterend) == 2);
    assert(h_output1[0] == 5);
    assert(h_output2[0] == 5);
    assert(h_output1[1] == 5);
    assert(h_output2[1] == 5);

    #if 0 // 1 does not compile

    thrust::device_vector<int> d_data = h_data;
    thrust::device_vector<int> d_stencil = h_stencil;
    thrust::device_vector<int> d_output1 = h_output1;
    thrust::device_vector<int> d_output2 = h_output2;

    auto d_zippedOutput = thrust::make_zip_iterator(thrust::make_tuple(
        d_output1.begin(),
        d_output2.begin()
    ));

    auto d_outputiter = thrust::make_transform_output_iterator(
        d_zippedOutput,
        duplicateValue
    );

    auto d_outputiterend = thrust::copy_if(
        d_data.data(),
        d_data.data() + 10,
        d_stencil.data(),
        d_outputiter,
        [] __host__ __device__ (int flag){
            return flag > 0;
        }
    );

    assert(thrust::distance(d_outputiter, d_outputiterend) == 2);
    h_output1 = d_output1;
    h_output2 = d_output2;
    assert(h_output1[0] == 5);
    assert(h_output2[0] == 5);
    assert(h_output1[1] == 5);
    assert(h_output2[1] == 5);

    #endif
}

The device version does not compile because of a deleted assignment operator. https://cuda.godbolt.org/z/zd5ajWYsT

/opt/compiler-explorer/libs/thrustcub/trunk/thrust/system/cuda/detail/copy_if.h(834): error: function "thrust::transform_output_iterator<UnaryFunction, OutputIterator>::operator=(const thrust::transform_output_iterator<lambda [](int)->thrust::tuple<int, int, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<int>>, thrust::detail::normal_iterator<thrust::device_ptr<int>>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>>> &) [with UnaryFunction=lambda [](int)->thrust::tuple<int, int, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, OutputIterator=thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<int>>, thrust::detail::normal_iterator<thrust::device_ptr<int>>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>>]" (declared implicitly) cannot be referenced -- it is a deleted function
     
@alliepiper alliepiper added type: bug: functional Does not work as intended. P1: should have Necessary, but not critical. labels Apr 4, 2022
@jrhemstad jrhemstad added this to CCCL Aug 11, 2022
@pauleonix
Copy link

I just ran into the same issue (deleted assignment operator of thrust::transform_output_iterator) instantiating thrust::inclusive_scan.

@gevtushenko
Copy link
Collaborator

This seems to be fixed on main, @fkallen, @pauleonix could you verify?

@pauleonix
Copy link

@senior-zero The default constructor was added recently, but as far as I see, the assignment operator was not added. I still get the same compilation error about it being deleted.

@gevtushenko
Copy link
Collaborator

Hello @pauleonix! Original reproducer compiles without issues on main. Could you please provide a reproducer for your issue with assignment operator?

@pauleonix
Copy link

pauleonix commented Dec 6, 2022

The code is

#include <random>

#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/scan.h>
#include <thrust/zip_function.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/iterator/transform_output_iterator.h>

void foo(thrust::device_vector<float> const &input,
         thrust::device_vector<float> &output,
         float threshold,
         int interval_size) {

    auto in_iter = thrust::make_zip_iterator(thrust::make_tuple(
        thrust::make_counting_iterator(0),
        thrust::make_transform_iterator(
            input.cbegin(),
            [threshold, interval_size]
            __device__ (float in) -> int {
                return in > threshold ? interval_size : 0;
            })));

    auto out_iter = thrust::make_transform_output_iterator(
            output.begin(),
            thrust::make_zip_function(
                [threshold, interval_size]
                __device__ (int, int scan_result) {
                    return scan_result == interval_size ? threshold : 0.f;
                }));

    thrust::inclusive_scan(in_iter, in_iter + input.size(),
                           out_iter,
                           [] __device__ (thrust::tuple<int, int> const left,
                                          thrust::tuple<int, int> const right) {
                               auto const distance = thrust::get<0>(right) - thrust::get<0>(left);
                               return thrust::make_tuple(
                                    thrust::get<0>(right),
                                    (thrust::get<1>(left) > distance) ? (thrust::get<1>(left) - distance)
                                                                      : thrust::get<1>(right));
                           });
}

thrust::host_vector<float> generate_data(int size, float threshold) {
    thrust::host_vector<float> data(size);
    std::default_random_engine rng(123456789);
    std::uniform_real_distribution<float> real_dist(0.0f, 1.1f * threshold);
    for (float &val : data) {
        val = real_dist(rng);
    }
    return data;
}

int main() {
    constexpr int N = 1 << 20;
    constexpr int interval_size = 42;
    constexpr float threshold = 42.f;

    auto data = generate_data(N, threshold);

    thrust::device_vector<float> d_data(data);
    thrust::device_vector<float> d_out(N);
    foo(d_data, d_out, threshold, interval_size);

    thrust::host_vector<int> out(d_out);
}

I get

error: function "thrust::transform_output_iterator<UnaryFunction, OutputIterator>::operator=(const thrust::transform_output_iterator<thrust::zip_function<lambda [](int, int)->float>, thrust::detail::normal_iterator<thrust::device_ptr<float>>> &) [with UnaryFunction=thrust::zip_function<lambda [](int, int)->float>, OutputIterator=thrust::detail::normal_iterator<thrust::device_ptr<float>>]" (declared implicitly) cannot be referenced -- it is a deleted function

and I freshly cloned Thrust after you asked me to check main.

@miscco
Copy link
Collaborator

miscco commented Dec 6, 2022

It seems that this is rather an issue with device lambdas than transform_output_iterator

Diggin a bit deeper the actual issue is in zip_function which has a deleted copy assignment because the passed in lambda has a deleted assignment operator

You can see that here, where I replaced the device lambda with a struct and device call operator https://cuda.godbolt.org/z/5cE6hcrG9

@pauleonix
Copy link

Yes, I can confirm that the device lambda is the problem.
So should I open a new issue for this? Either to document this restriction on device lambdas for all fancy iterators that can be used for output and have a functor member or to long-term get rid of these assignments in the algorithms if that is possible?

@miscco
Copy link
Collaborator

miscco commented Dec 6, 2022

As far as I know extended lambdas are generally not supported @senior-zero

@pauleonix
Copy link

Huh, that would be news to me, I always thought Thrust and device lambdas fit together like bread and butter. Although I think I previously observed that the Thrust examples don't use them. Even the lambda.cu example uses a placeholder expression instead of an actual lambda. I guess it's just the first time for me to run into trouble due to a device lambda. I just love avoiding the boilerplate of defining functor classes.

@gevtushenko
Copy link
Collaborator

Some of the discussion on device lambdas can be found here and here

@pauleonix
Copy link

@senior-zero Thanks, that is interesting. But if I understand it right, the described issues only appear when using __device__ while __host__ __device__ lambdas are ok, because the __host__ part allows getting the right result type. And proper diagnostics for the cases when __device__ lambdas are problematic are in place by now.

The issue here is not fixed by adding __host__, but it still only happens with extended lambdas, so I guess this is caused by restriction 17?

As described previously, the CUDA compiler replaces an extended __device__ or __host__ __device__ lambda expression with an instance of a placeholder type in the code sent to the host compiler. This placeholder type may define C++ special member functions (e.g. constructor, destructor). As a result, some standard C++ type traits may return different results for the closure type of the extended lambda, in the CUDA frontend compiler versus the host compiler. The following type traits are affected: std::is_trivially_copyable, std::is_trivially_constructible, std::is_trivially_copy_constructible, std::is_trivially_move_constructible, std::is_trivially_destructible.

Either way, instead of creating a new issue I should just mention this under #779, I guess. Then you can close this one.

@jrhemstad
Copy link
Collaborator

so I guess this is caused by restriction 17?

Indeed.

The problem is with the fundamental restrictions on extended lambdas and there isn't much Thrust can do about it.

We are trying to make some minor improvements to at least detect the situations we know where device lambdas will fail and emit a more useful diagnostic (like #1688). You've reminded me that we should give cuda::std::is_trivially_copyable, cuda::std::is_trivially_constructible, cuda::std::is_trivially_copy_constructible, cuda::std::is_trivially_move_constructible, cuda::std::is_trivially_destructible. the same treatment.

We have also added things like cuda::proclaim_return_type in an effort to circumvent some of the restrictions on extended lambdas.

All of us wish extended lambdas could work better with Thrust 😞.

@jrhemstad
Copy link
Collaborator

I filed NVIDIA/cccl#1004 in libcu++ to update the other traits that are known to be broken with extended lambdas.

@wmaxey
Copy link
Member

wmaxey commented Feb 23, 2023

I am closing this issue as this is an intrinsic issue with NVCC and libcu++ can now provide feedback when device lambdas are used this way.

@wmaxey wmaxey closed this as completed Feb 23, 2023
@github-project-automation github-project-automation bot moved this to Done in CCCL Feb 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
P1: should have Necessary, but not critical. thrust type: bug: functional Does not work as intended.
Projects
Archived in project
Development

No branches or pull requests

7 participants