-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Compile the template
Fails with RTX 4090
#2073
Comments
Still error: when compile the template, it shows "ptxas error : Value of threads per SM for entry ZN4raft9neighbors12experimental10nn_descent6detail17local_join_kernelIiNS3_12InternalID_tIiEEEEvPKT_S9_PK4int2S9_S9_SC_iPK6__halfiPT0_PfiPiSI is out of range. .minnctapersm will be ignored" . This looks like the "InternalID_t" out of range? struct InternalID_t;
// InternalID_t uses 1 bit for marking (new or old).
template <>
class InternalID_t<int> {
private:
using Index_t = int;
Index_t id_{std::numeric_limits<Index_t>::max()};
public:
__host__ __device__ bool is_new() const { return id_ >= 0; }
__host__ __device__ Index_t& id_with_flag() { return id_; }
__host__ __device__ Index_t id() const
{
if (is_new()) return id_;
return -id_ - 1;
}
__host__ __device__ void mark_old()
{
if (id_ >= 0) id_ = -id_ - 1;
}
__host__ __device__ bool operator==(const InternalID_t<int>& other) const
{
return id() == other.id();
}
};
|
This comment was marked as outdated.
This comment was marked as outdated.
OK, fixed. change the 694-th line of code of #if (__CUDA_ARCH__) == 750 || (__CUDA_ARCH__) == 860 || (__CUDA_ARCH__) == 890 the bug fixed.
// launch_bounds here denote BLOCK_SIZE = 512 and MIN_BLOCKS_PER_SM = 4
// Per
// https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications,
// MAX_RESIDENT_THREAD_PER_SM = BLOCK_SIZE * BLOCKS_PER_SM = 2048
// For architectures 750 and 860, the values for MAX_RESIDENT_THREAD_PER_SM
// is 1024 and 1536 respectively, which means the bounds don't work anymore and i find RTX4090's MAX_RESIDENT_THREAD_PER_SM is also 1536, and its arch is 89. So i add |
Signed-off-by: jeff <[email protected]>
…sue (rapidsai#2073) Signed-off-by: jeff <[email protected]>
…sue (rapidsai#2073) Signed-off-by: jeff <[email protected]>
[bug]Fix compile failure on RTX 4090. related issue (#2073) Authors: - JiefengWang (https://github.com/JieFengWang) - William Hicks (https://github.com/wphicks) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - William Hicks (https://github.com/wphicks) - Corey J. Nolet (https://github.com/cjnolet) URL: #2076
Since this PR has been merged, this issue will be closed. |
Describe the bug
After installed RAFT via mamba. I try to compile the template on my machine. the compile fails with error info :
"ptxas error : Value of threads per SM for entry ZN4raft9neighbors12experimental10nn_descent6detail17local_join_kernelIiNS3_12InternalID_tIiEEEEvPKT_S9_PK4int2S9_S9_SC_iPK6__halfiPT0_PfiPiSI is out of range. .minnctapersm will be ignored
ptxas fatal : Ptx assembly aborted due to errors
gmake[2]: *** [_deps/raft-build/CMakeFiles/raft_objs.dir/build.make:1813: _deps/raft-build/CMakeFiles/raft_objs.dir/src/raft_runtime/neighbors/cagra_build.cu.o] Error 255"
Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.
cd template
./build.sh
Expected behavior
A clear and concise description of what you expected to happen.
Environment details (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: