Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

"illegal memory access" when using a custom copy constructor with thrust::transform #1578

Closed
neoblizz opened this issue Dec 9, 2021 · 4 comments

Comments

@neoblizz
Copy link

neoblizz commented Dec 9, 2021

Not sure if this is a bug or intended behavior. I wrote this short working example that shows what I am trying to do;

#include <cstdlib>  // EXIT_SUCCESS

#include <thrust/device_vector.h>
#include <thrust/host_vector.h>

#include <thrust/transform.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/discard_iterator.h>

template <typename T>
__global__ void kernel(T t) {
  t.what_is_inside();
}

template <typename op_t>
__global__ void lambda_kernel(op_t op) {
  int idx = threadIdx.x;
  auto discard = op(idx);
}

struct thrust_container {
  // Constructor
  thrust_container() {
    m = std::make_shared<thrust::device_vector<int>>(
        thrust::device_vector<int>(1, 1));
    raw_ptr = nullptr;
  }

  // Copy Constructor
  thrust_container(thrust_container const& rhs) {
    m = rhs.m;
    raw_ptr = rhs.m.get()->data().get();
  }

  __host__ __device__ void what_is_inside() const { printf("%i\n", *raw_ptr); }

 private:
  std::shared_ptr<thrust::device_vector<int>> m;
  int* raw_ptr;
};

void test_copy_ctor(thrust_container& t) {
  // works.
  kernel<<<1, 1>>>(t);
  cudaDeviceSynchronize();

  auto lambda_op = [=] __device__(const int& idx) {
    t.what_is_inside();
    return 0;
  };

  // works.
  lambda_kernel<<<1, 1>>>(lambda_op);
  cudaDeviceSynchronize();

  // fails.
  thrust::transform(
      thrust::device,                          // execution policy
      thrust::make_counting_iterator<int>(0),  // input iterator: first
      thrust::make_counting_iterator<int>(1),  // input iterator: last
      thrust::make_discard_iterator(),         // output iterator: ignore
      lambda_op                                // unary operation
  );
}

int main() {
  thrust_container t;
  test_copy_ctor(t);
  return EXIT_SUCCESS;
}

The goal is to create a self-contained class that includes the shared_ptr/thrust vector within it, but be able to use that class on the GPU. And for that, I wrote a simple copy constructor that allows it to be used within a kernel. It works when I use it in a simple custom kernel or within a lambda operator that is called within a kernel. However, when I use thrust::transform to mimic the same behavior it results in an illegal memory access error. Is that the intended behavior? I was imagining transform shouldn't be doing anything complicated to make this cause the memory access error.

Output

1
1
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  transform: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
Aborted

System information

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Aug_15_21:14:11_PDT_2021
Cuda compilation tools, release 11.4, V11.4.120
Build cuda_11.4.r11.4/compiler.30300941_0

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.6 LTS
Release:        18.04
Codename:       bionic

$ wsl.exe --status
Default Distribution: Ubuntu-18.04
Default Version: 2
Windows Subsystem for Linux was last updated on 10/6/2021
WSL automatic updates are on.
Kernel version: 5.10.60.1
neoblizz added a commit to gunrock/essentials that referenced this issue Dec 9, 2021
@alliepiper
Copy link
Collaborator

Using a std::shared_ptr from device code is not supported. For a function to be executed on the device, it requires a __device__ annotation, and standard library functions do not provide these CUDA specific annotations. Compiling your example yields several warnings about mismatched execution space annotations: https://www.godbolt.org/z/dbnx4EfYj These should be addressed before continuing debugging this, as they are likely causing issues here.

Also be aware that __device__ lambdas are not well supported by Thrust, since they do not provide accurate information about return types, etc when inspected from host contexts. Using a functor is much more reliable given the current restrictions around extended lambda. See this discussion for more info.

You may want to consider having a host-only container that maintains the lifetime of the thrust::device_vector and then just pass a pointer or iterator to the kernel, but trying to using a std::shared_ptr to manage the lifetime across both host and device code will not work.

@neoblizz
Copy link
Author

Also be aware that __device__ lambdas are not well supported by Thrust, since they do not provide accurate information about return types, etc when inspected from host contexts. Using a functor is much more reliable given the current restrictions around extended lambda. See this discussion for more info.

Thank you for this pointer, that was eye-opening.

Using a std::shared_ptr from device code is not supported. For a function to be executed on the device, it requires a __device__ annotation, and standard library functions do not provide these CUDA specific annotations. Compiling your example yields several warnings about mismatched execution space annotations: https://www.godbolt.org/z/dbnx4EfYj These should be addressed before continuing debugging this, as they are likely causing issues here.

source>(49): warning #20011-D: calling a __host__ function("thrust_container::thrust_container(const thrust_container&)") from a __host__ __device__ function("test_copy_ctor(    ::thrust_container &)::[lambda(const int &) (instance 1)]::[lambda(const int &) (instance 1)]") is not allowed

<source>(49): warning #20011-D: calling a __host__ function("std::__shared_count<( ::__gnu_cxx::_Lock_policy)2> ::~__shared_count()") from a __host__ __device__ function("std::__shared_ptr< ::thrust::device_vector<int,  ::thrust::device_allocator<int> > , ( ::__gnu_cxx::_Lock_policy)2> ::~__shared_ptr") is not allowed

I understand that std::shared_ptr and thrust::device_vector (and the likes) are not supported in device code, but the warnings are irrelevant if the device side call never happens. I find it interesting that with the same warnings, the thrust transform operator gives me illegal memory access whereas calling a kernel with the lambda works fine (obviously as long as I do not touch stuff that isn't supported on the device).

I guess my question here is what is thrust::transform doing differently.

You may want to consider having a host-only container that maintains the lifetime of the thrust::device_vector and then just pass a pointer or iterator to the kernel, but trying to using a std::shared_ptr to manage the lifetime across both host and device code will not work.

I have considered this solution as well, was trying to come up with a solution that had only one container.

@fkallen
Copy link
Contributor

fkallen commented Dec 14, 2021

I found this interesting and played around a bit with your code. With CUDA 11.5, it seems to work if you declare your copy-constructor as host-device function, and use the flag --expt-relaxed-constexpr to be able to constexpr construct the shared_ptr . I guess the issue exists because in thrust code path the lambda seems to be copied in device-code (1 copy per thread???) which will probably try to call your copy constructor, but it's host-only. I found this by adding some debug statements to your code.

#include <cstdlib>  // EXIT_SUCCESS

#include <thrust/device_vector.h>
#include <thrust/host_vector.h>

#include <thrust/transform.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/discard_iterator.h>

template <typename T>
__global__ void kernel(T t) {
  t.what_is_inside();
}

template <typename op_t>
__global__ void lambda_kernel(op_t op) {
  int idx = threadIdx.x;
  auto discard = op(idx);
}

struct thrust_container {
  // Constructor
  thrust_container() {
    m = std::make_shared<thrust::device_vector<int>>(
        thrust::device_vector<int>(1, 1));
    raw_ptr = nullptr;
  }

  // Copy Constructor
  __host__ __device__
  thrust_container(thrust_container const& rhs) {
    
    #ifdef __CUDA_ARCH__
    printf("copy constructor device\n");
    raw_ptr = rhs.raw_ptr;
    #else
    printf("copy constructor host\n");
    m = rhs.m;
    raw_ptr = rhs.m.get()->data().get();
    #endif
  }

  __host__ __device__ void what_is_inside() const { printf("%i\n", *raw_ptr); }

 private:
  std::shared_ptr<thrust::device_vector<int>> m;
  int* raw_ptr;
};

void test_copy_ctor(thrust_container& t) {
  // works.
  printf("before kernel\n");
  kernel<<<1, 1>>>(t);
  cudaDeviceSynchronize();
  printf("kernel done\n");

  auto lambda_op = [=] __device__(const int& idx) {
    t.what_is_inside();
    return 0;
  };

  // works.
  printf("before lambda_kernel\n");
  lambda_kernel<<<1, 1>>>(lambda_op);
  cudaDeviceSynchronize();
  printf("lambda_kernel done\n");

  // fails.
  thrust::transform(
      thrust::device,                          // execution policy
      thrust::make_counting_iterator<int>(0),  // input iterator: first
      thrust::make_counting_iterator<int>(1),  // input iterator: last
      thrust::make_discard_iterator(),         // output iterator: ignore
      lambda_op                                // unary operation
  );

  printf("thrust::transform done\n");
}

int main() {
  thrust_container t;
  test_copy_ctor(t);
  return EXIT_SUCCESS;
}

On my machine, this produces the following long output.

before kernel
copy constructor host
1
kernel done
copy constructor host
copy constructor host
before lambda_kernel
copy constructor host
1
lambda_kernel done
copy constructor host
copy constructor host
copy constructor host
copy constructor host
copy constructor host
copy constructor host
copy constructor host
copy constructor host
copy constructor host
copy constructor host
copy constructor host
copy constructor host
copy constructor host
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
copy constructor device
1
thrust::transform done

@neoblizz
Copy link
Author

I guess the issue exists because in thrust code path the lambda seems to be copied in device-code (1 copy per thread???) which will probably try to call your copy constructor, but it's host-only. I found this by adding some debug statements to your code.

Very interesting to see the behavior of thrust::transform with respect to this problem. Thank you for taking a look at this, and @allisonvacanti's feedback on the __device__ lambda stuff. Closing this issue now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants