Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] pairwise distances and fused L2 1-NN kernels is limited by launch config #221

Closed
mdoijade opened this issue May 7, 2021 · 4 comments
Labels
bug Something isn't working

Comments

@mdoijade
Copy link
Contributor

mdoijade commented May 7, 2021

Describe the bug
pairwise distances and fused L2 1-NN kernels which are based on contractions_NT class launches CTAs proportional to the input size.
Due to this it can only launch kernels where number of blocks in Y is less than 65536 as this is the max limit of cuda kernel grid size.
This severely limits the input size which these kernels can process.
This should be converted to be grid strided kernels launching CTAs based on GPU occupancy heuristics.

Steps/Code to reproduce bug
make the n in any of the pairwise distances/fused L2-NN kernel to be of size 65536x64 and run the kernel

Expected behavior
A clear and concise description of what you expected to happen.

Environment details (please complete the following information):

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
  • Method of RAFT install: [conda, Docker, or from source]
    • If method of install is [Docker], provide docker pull & docker run commands used

Additional context
Add any other context about the problem here.

@mdoijade mdoijade added the bug Something isn't working label May 7, 2021
@mdoijade
Copy link
Contributor Author

mdoijade commented May 7, 2021

@teju85 FYI.

@cyy857
Copy link

cyy857 commented May 17, 2021

Hello , is there any update on this issue?

@mdoijade
Copy link
Contributor Author

this PR addresses this issue - #232

rapids-bot bot pushed a commit that referenced this issue Jun 2, 2021
This PR addresses issues mentioned in #221
-- Adds grid stride based fusedL2NN kernel, this gives approx 1.85x speed up over previous version of this kernel.
-- Adds support in pairwise dist base class to work for any input size by adding support for grid stride based work distribution.

Authors:
  - Mahesh Doijade (https://github.com/mdoijade)

Approvers:
  - Thejaswi. N. S (https://github.com/teju85)
  - Divye Gala (https://github.com/divyegala)
  - Alex Fender (https://github.com/afender)

URL: #232
rapids-bot bot pushed a commit that referenced this issue Jun 11, 2021
This PR addresses issues mentioned in #221
-- Adds grid stride based fusedL2NN kernel, this gives approx 1.85x speed up over previous version of this kernel.
-- Adds support in pairwise dist base class to work for any input size by adding support for grid stride based work distribution.

This was submitted to branch-21.06 through PR - #232 
but later reverted due to intermittent failure by - #246

Authors:
  - Mahesh Doijade (https://github.com/mdoijade)

Approvers:
  - Thejaswi. N. S (https://github.com/teju85)
  - Brad Rees (https://github.com/BradReesWork)

URL: #250
@mdoijade
Copy link
Contributor Author

closing this issue as the fix PR is merged now in branch-21.08 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants