Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Compile the template Fails with RTX 4090 #2073

Closed
JieFengWang opened this issue Dec 28, 2023 · 4 comments
Closed

[BUG] Compile the template Fails with RTX 4090 #2073

JieFengWang opened this issue Dec 28, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@JieFengWang
Copy link
Contributor

Describe the bug

After installed RAFT via mamba. I try to compile the template on my machine. the compile fails with error info :
"ptxas error : Value of threads per SM for entry ZN4raft9neighbors12experimental10nn_descent6detail17local_join_kernelIiNS3_12InternalID_tIiEEEEvPKT_S9_PK4int2S9_S9_SC_iPK6__halfiPT0_PfiPiSI is out of range. .minnctapersm will be ignored
ptxas fatal : Ptx assembly aborted due to errors
gmake[2]: *** [_deps/raft-build/CMakeFiles/raft_objs.dir/build.make:1813: _deps/raft-build/CMakeFiles/raft_objs.dir/src/raft_runtime/neighbors/cagra_build.cu.o] Error 255"

  • device RTX 4090
  • ubuntu 22.04
  • nvcc 12.2
  • g++ 11.4
  • cmake 3.28

Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.

cd template
./build.sh

Expected behavior
A clear and concise description of what you expected to happen.

  • I wanna compile the template with success

Environment details (please complete the following information):

  • Environment location: [Bare-metal]
  • Method of RAFT install: [mamba]

Additional context
Add any other context about the problem here.

  • I repeat the same steps with RTX 1080Ti, RTX 3090, all success, only RTX 4090 fails.
@JieFengWang JieFengWang added the bug Something isn't working label Dec 28, 2023
@JieFengWang
Copy link
Contributor Author

Still error: when compile the template, it shows "ptxas error : Value of threads per SM for entry ZN4raft9neighbors12experimental10nn_descent6detail17local_join_kernelIiNS3_12InternalID_tIiEEEEvPKT_S9_PK4int2S9_S9_SC_iPK6__halfiPT0_PfiPiSI is out of range. .minnctapersm will be ignored" .

This looks like the "InternalID_t" out of range?

struct InternalID_t;

// InternalID_t uses 1 bit for marking (new or old).
template <>
class InternalID_t<int> {
 private:
  using Index_t = int;
  Index_t id_{std::numeric_limits<Index_t>::max()};

 public:
  __host__ __device__ bool is_new() const { return id_ >= 0; }
  __host__ __device__ Index_t& id_with_flag() { return id_; }
  __host__ __device__ Index_t id() const
  {
    if (is_new()) return id_;
    return -id_ - 1;
  }
  __host__ __device__ void mark_old()
  {
    if (id_ >= 0) id_ = -id_ - 1;
  }
  __host__ __device__ bool operator==(const InternalID_t<int>& other) const
  {
    return id() == other.id();
  }
};

@JieFengWang

This comment was marked as outdated.

@JieFengWang
Copy link
Contributor Author

OK, fixed.

change the 694-th line of code of /path/to/template/build/_deps/raft-src/cpp/include/raft/neighbors/detail/nn_descent.cuh to

#if (__CUDA_ARCH__) == 750 || (__CUDA_ARCH__) == 860 || (__CUDA_ARCH__) == 890

the bug fixed.

  • since I see a comment here
// launch_bounds here denote BLOCK_SIZE = 512 and MIN_BLOCKS_PER_SM = 4
// Per
// https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications,
// MAX_RESIDENT_THREAD_PER_SM = BLOCK_SIZE * BLOCKS_PER_SM = 2048
// For architectures 750 and 860, the values for MAX_RESIDENT_THREAD_PER_SM
// is 1024 and 1536 respectively, which means the bounds don't work anymore

and i find RTX4090's MAX_RESIDENT_THREAD_PER_SM is also 1536, and its arch is 89. So i add || (__CUDA_ARCH__) == 890 to this

JieFengWang added a commit to JieFengWang/raft that referenced this issue Jan 3, 2024
JieFengWang added a commit to JieFengWang/raft that referenced this issue Jan 4, 2024
wphicks pushed a commit to wphicks/raft that referenced this issue Jan 5, 2024
rapids-bot bot pushed a commit that referenced this issue Jan 17, 2024
[bug]Fix compile failure on RTX 4090. related issue (#2073)

Authors:
  - JiefengWang (https://github.com/JieFengWang)
  - William Hicks (https://github.com/wphicks)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - William Hicks (https://github.com/wphicks)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #2076
@JieFengWang
Copy link
Contributor Author

Since this PR has been merged, this issue will be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant