Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Unexpected behaviour when return type is specified for transform iterator. #1299

Closed
trivialfis opened this issue Sep 29, 2020 · 10 comments
Closed
Labels
duplicate Already exists. type: bug: functional Does not work as intended.
Milestone

Comments

@trivialfis
Copy link

trivialfis commented Sep 29, 2020

Platform

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0
$ gcc --version
gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0

Reproduce

Following snippet is an example of creating thrust::transform_iterator with and without specifying return type. When the return type is not specified (the default), the iterator works correctly. But if we supply the return type explicitly, thrust scan generates out of bound iterators.

#include <thrust/scan.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/discard_iterator.h>

// Copied from `thrust::make_transform_iterator`.
template <class AdaptableUnaryFunction, class Iterator>
inline __host__ __device__
thrust::transform_iterator<AdaptableUnaryFunction, Iterator>
make_transform_iterator_good(Iterator it, AdaptableUnaryFunction fun) {
  return thrust::transform_iterator<AdaptableUnaryFunction, Iterator>(it, fun);
}

// Used to help return type deduction on Windows.
template <typename ReturnT, typename Iterator, typename AdaptableUnaryFunction>
__host__ __device__ thrust::transform_iterator<AdaptableUnaryFunction, Iterator, ReturnT>
make_transform_iterator_buggy(Iterator it, AdaptableUnaryFunction func) {
  // From the square root example in `thrust::transform_iterator`.
  return thrust::transform_iterator<AdaptableUnaryFunction, Iterator, ReturnT>(it, func);
}

void TestScan() {
  size_t size = 2150602529;
  {
    // works
    auto key_iter = make_transform_iterator_good(
        thrust::make_counting_iterator<size_t>(0ul),
        [=] __device__(size_t idx) {
          assert(idx < size);
          return idx;
        });
    auto end_it = key_iter + size;
    thrust::inclusive_scan(thrust::device, key_iter, end_it,
                           thrust::make_discard_iterator(),
                           [] __device__(auto a, auto b) { return b; });
  }
  {
    // Assertion error.
    auto key_iter = make_transform_iterator_buggy<size_t>(
        thrust::make_counting_iterator<size_t>(0ul),
        [=] __device__(size_t idx) {
          assert(idx < size);
          return idx;
        });
    auto end_it = key_iter + size;
    thrust::inclusive_scan(thrust::device, key_iter, end_it,
                           thrust::make_discard_iterator(),
                           [] __device__(auto a, auto b) { return b; });
  }
}

int main () {
  TestScan();
}

Original comment is posted at #967 (comment)

@trivialfis trivialfis changed the title Unexpected behaviour when returned type is specified for transform iterator. Unexpected behaviour when return type is specified for transform iterator. Sep 29, 2020
@alliepiper alliepiper added type: bug: functional Does not work as intended. triage Needs investigation and classification. labels Sep 29, 2020
@alliepiper alliepiper added this to the 1.11.0 milestone Sep 29, 2020
@alliepiper
Copy link
Collaborator

Does this work when you change the __device__ lambda to an explicit functor with a __device__ operator()?

Device lambdas are very unreliable when used with generic algorithms like Thrust. See the discussion in #779. I suspect the lambda is the problem since this works when the return type is specified instead of deduced.

@trivialfis
Copy link
Author

@allisonvacanti Thanks for the reply. I will try some workarounds like using functor and host device attributes.

I suspect the lambda is the problem since this works when the return type is specified instead of deduced.

Actually, it works when return type is not specified.

@alliepiper
Copy link
Collaborator

Actually, it works when return type is not specified.

Ah, I see now, I didn't look closely enough before. This bug just got much more interesting!

Related, I'm planning to remove Thrust's current scan implementation in the near future and just switch to CUB's DeviceScan (#1301). That may fix this issue if it persists with a full functor.

@trivialfis
Copy link
Author

@allisonvacanti Update:

Actually it doesn't work either way.

When return type is not specified, it's actually an invalid device function. I checked it with cuda-memcheck. So I changed __device__ into __host__ __device__, integer overflow happened. Functor doesn't work either. Here is the MRE:

#include <thrust/scan.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/discard_iterator.h>

struct KeyIter {
  size_t size;
  size_t __host__ __device__ operator()(size_t idx) {
    assert(idx < size);
    return idx;
  }
};

void TestScan() {
  size_t size = 2150602529;
  {
    auto key_iter = thrust::make_transform_iterator(
        thrust::make_counting_iterator<size_t>(0ul),
        [=] __host__ __device__(size_t idx) {
          assert(idx < size);
          return idx;
        });
    auto end_it = key_iter + size;
    thrust::inclusive_scan(thrust::device, key_iter, end_it,
                           thrust::make_discard_iterator(),
                           [] __device__(auto a, auto b) { return b; });
  }
  {
     auto key_iter = thrust::make_transform_iterator(
        thrust::make_counting_iterator<size_t>(0ul),
        KeyIter{size});
    auto end_it = key_iter + size;
    thrust::inclusive_scan(thrust::device, key_iter, end_it,
                           thrust::make_discard_iterator(),
                           [] __device__(auto a, auto b) { return b; });
  }
}

int main () {
  TestScan();
}

@trivialfis
Copy link
Author

Not sure how it works, but isn't the num_items in cub an 32bit signed integer? https://github.com/NVIDIA/cub/blob/a39e385cc6be20754f859dd266021ab1d88459d3/cub/device/device_scan.cuh#L154

Should I open an issue in cub for switching to size_t?

@griwes
Copy link
Collaborator

griwes commented Oct 6, 2020

No. CUB is consciously using 32 bits there; just switching all the public interfaces to use 64 bit indices causes a perf regression of about 10%-15%, according to a quick benchmark I did some time ago. CUB has a secondary interface that allows specifying the index type explicitly; in fact you can see it invoked on line 168.

As Allison mentioned, we're not using CUB in scans right now, but scan itself should've been fixed by 1d16811...

@trivialfis
Copy link
Author

@griwes Thanks for the clarification.

@alliepiper
Copy link
Collaborator

I've started working on refactoring Thrust to use CUB's scans directly:

#1304
NVIDIA/cub#210

I still need to fix some issues and do more testing, but this looks like this will fix your issue. When I compile your test programs here against that branch and replace the __device__ lambdas with __device__ functors, cuda-memcheck no longer reports any errors.

@alliepiper alliepiper removed the triage Needs investigation and classification. label Oct 7, 2020
@alliepiper
Copy link
Collaborator

Closing as a duplicate since the fundamental issues here are tracked in other bugs:

  1. Consider always discarding __device__ lambdas' results #779: __device__ lambdas are not currently viable in Thrust due to CUDA language restrictions.
  2. Unify Thrust/CUB scan algorithms #1301: The Thrust scan implementation needs to be replaced with the CUB device scan.

@alliepiper alliepiper added the duplicate Already exists. label Oct 7, 2020
@alliepiper
Copy link
Collaborator

Also, NVIDIA/cccl#744 is tracking the 32-bit indexing issues.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
duplicate Already exists. type: bug: functional Does not work as intended.
Projects
None yet
Development

No branches or pull requests

3 participants