warning: calling a host function [...] from a host device function [...] #949

95A31 · 2019-01-06T03:12:58Z

Hello,

Simple use of thrust::sort causes compiling warnings and runtime error.

I get the following warnings:

/usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/sort.h(1507): warning: calling a __host__ function("thrust::detail::aligned_reinterpret_cast<int *, unsigned char *> ") from a __host__ __device__ function("thrust::cuda_cub::__radix_sort::radix_sort< ::thrust::detail::integral_constant<bool, (bool)0> ,  ::thrust::cuda_cub::par_t, int, int, long,  ::thrust::less<int> > ") is not allowed

/usr/local/cuda/bin/../targets/x86_64-linux/include/thrust/system/cuda/detail/sort.h(1510): warning: calling a __host__ function("thrust::detail::aligned_reinterpret_cast<int *, unsigned char *> ") from a __host__ __device__ function("thrust::cuda_cub::__radix_sort::radix_sort< ::thrust::detail::integral_constant<bool, (bool)0> ,  ::thrust::cuda_cub::par_t, int, int, long,  ::thrust::less<int> > ") is not allowed

At runtime the program crash:

Bus error (core dumped)

This is a MWE:

#include <iostream>
#include <thrust/sort.h>

#define ARRAY_SIZE 10

__global__
void sortingKernel(int* arrayToSort, int arraySize)
{
    int* begin = arrayToSort;
    int* end = arrayToSort + arraySize;

    thrust::sort(thrust::device, begin, end);
}

int main(void)
{
    int arraySize = ARRAY_SIZE;

    int* arrayToSort;
    cudaMallocManaged(&arrayToSort, sizeof(int) * arraySize);
    for(int i = 0; i < arraySize; i += 1)
    {
        arrayToSort[i] = arraySize - i;
    }

    std::cout << "Before: ";
    for(int i = 0; i < arraySize; i += 1)
    {
        std::cout << arrayToSort[i] << " ";
    }
    std::cout << std::endl;

    sortingKernel<<<1,1>>>(arrayToSort, arraySize);
    cudaDeviceSynchronize();

    std::cout << "After: ";
    for(int i = 0; i < arraySize; i += 1)
    {
        std::cout << arrayToSort[i] << " ";
    }
    std::cout << std::endl;

    return 0;
}

I compile it with the following command:

nvcc -arch=sm_35 -rdc=true -g Sort.cu -lcudadevrt -o Sort

The running system is:

OS: Ubuntu Server 18.04 LTS
Kernel version: 4.15.0-43
CUDA version: 10.0
Thrust version: 1.9
Driver version: 410.79
GPU: Tesla K40c

The text was updated successfully, but these errors were encountered:

nferenc13 · 2019-03-07T20:32:01Z

I have tried the same (dynamic parallelism) with thrust::sort and it doesn't work either.

#include <stdio.h>
#include <thrust/reduce.h>
#include <thrust/execution_policy.h>

#define N 10

#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true) {
   if (code != cudaSuccess) {
    fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
    if (abort) exit(code);
   }
}

__global__ void test_kernel( int* data, int size, int * sum ) {
    *sum = thrust::reduce( thrust::seq, data , data + size, (int)0, thrust::plus<int>() );
//    *sum = thrust::reduce( thrust::device, data , data + size, (int)0, thrust::plus<int>() );
}

int main() {
    int *h_data = (int *)malloc( N * sizeof(int) );
    int sum  ;
    for ( int i = 0 ; i < N ; i++ )
        h_data[i] = 1 ;
    int *d_data; gpuErrchk(cudaMalloc((void**)&d_data,     N * sizeof(int) ));
    int *d_sum;  gpuErrchk(cudaMalloc((void**)&d_sum,     sizeof(int) ));
    gpuErrchk(cudaMemcpy(d_data, h_data, N * sizeof(int), cudaMemcpyHostToDevice));
    test_kernel<<< 1, 1 >>>( d_data, N, d_sum );
    gpuErrchk(cudaDeviceSynchronize());
    gpuErrchk(cudaMemcpy(&sum, d_sum, sizeof(int), cudaMemcpyDeviceToHost));
    printf("Sum: %d", sum );
    free(h_data) ;
    cudaFree(d_data) ;
    cudaFree(d_sum) ;
    return 0;
}

My system:
Windows 10 64bit
CUDA version: 10.1
Thrust version: 1.9.4
Driver version: 419.35
GPU: GeeForce GTX 1050 Ti

codecircuit · 2019-04-03T09:09:57Z

@nferenc13 What do you mean by

(dynamic parallelism)

I think currently thrust::device backend is not meant to be used within a device function (all functions with __global__ or __device__ declaration specifier). But using thrust::sort and thrust::reduce with the thrust::seq backend works for me in device code. Moreover I can compile the code snippet you posted.

Bug 2422333 Bug 2522259 Bug 2528822 Github #949 Github #973

griwes · 2019-04-15T18:24:25Z

This should be fixed on master now; please reopen if that's not the case. I don't yet know what CUDA Toolkit version the fix will land in.

WilliamKF · 2019-10-01T16:55:42Z

Same issue with thrust::count() on Cuda V9.2.88.

Bug 2422333 Bug 2522259 Bug 2528822 Github #949 Github #973

samuelpmish · 2020-06-30T22:59:36Z

#include <vector>

#include <thrust/sort.h>
#include <thrust/binary_search.h>
#include <thrust/execution_policy.h>
#include <thrust/iterator/counting_iterator.h>

int main() {

  std::vector< uint32_t > keys = {3,1,6,3,2,7,3,2,4,7,1,3,2};
  std::vector< uint32_t > values = {1,7,8,3,2,3,1,3,2,1,4,2,4};
  thrust::stable_sort_by_key(thrust::host, keys.begin(), keys.end(), values.begin());

  std::vector< uint32_t > offsets(10, 0);
  thrust::lower_bound(thrust::host,
    keys.begin(),
    keys.end(),
    thrust::counting_iterator< uint32_t >(0),
    thrust::counting_iterator< uint32_t >(10),
    offsets.begin()
  );

  return 0;

}

these two calls to thrust are printing about 400 lines of warnings!

...
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include\thrust/system/detail/generic/binary_search.inl(67): warning : calling a __host__ function from a __host__ __device__ function is not allowed
...

MSVC 19.24.28314.0 + CUDA 11.0.16 on Windows 10

griwes mentioned this issue Apr 13, 2019

Another -dc device/host conflict with nvcc 10.1 + Thrust 1.9.4 #973

Closed

griwes added a commit that referenced this issue Apr 15, 2019

Properly host-device qualify some previously unqualified functions.

a6d41f6

Bug 2422333 Bug 2522259 Bug 2528822 Github #949 Github #973

griwes closed this as completed Apr 15, 2019

brycelelbach pushed a commit that referenced this issue May 16, 2020

Properly host-device qualify some previously unqualified functions.

fda1a9d

Bug 2422333 Bug 2522259 Bug 2528822 Github #949 Github #973

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

warning: calling a host function [...] from a host device function [...] #949

warning: calling a host function [...] from a host device function [...] #949

95A31 commented Jan 6, 2019 •

edited

Loading

nferenc13 commented Mar 7, 2019 •

edited

Loading

codecircuit commented Apr 3, 2019

griwes commented Apr 15, 2019 •

edited

Loading

WilliamKF commented Oct 1, 2019

samuelpmish commented Jun 30, 2020

warning: calling a __host__ function [...] from a __host__ __device__ function [...] #949

warning: calling a __host__ function [...] from a __host__ __device__ function [...] #949

Comments

95A31 commented Jan 6, 2019 • edited Loading

nferenc13 commented Mar 7, 2019 • edited Loading

codecircuit commented Apr 3, 2019

griwes commented Apr 15, 2019 • edited Loading

WilliamKF commented Oct 1, 2019

samuelpmish commented Jun 30, 2020

warning: calling a host function [...] from a host device function [...] #949

warning: calling a host function [...] from a host device function [...] #949

95A31 commented Jan 6, 2019 •

edited

Loading

nferenc13 commented Mar 7, 2019 •

edited

Loading

griwes commented Apr 15, 2019 •

edited

Loading