-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move contractions tiling logic outside of Contractions_NT #837
Move contractions tiling logic outside of Contractions_NT #837
Conversation
9029eed
to
7222e23
Compare
Please use GitHub's Draft PR feature instead of Some useful information about Draft PRs:
|
7222e23
to
d6a4587
Compare
This results in subtle issues with non-square KernelPolicy, as found in fusedL2KNN.
7b93722
to
e6976c5
Compare
Codecov ReportBase: 87.99% // Head: 87.99% // No change to project coverage 👍
Additional details and impacted files@@ Coverage Diff @@
## branch-23.02 #837 +/- ##
=============================================
Coverage 87.99% 87.99%
=============================================
Files 21 21
Lines 483 483
=============================================
Hits 425 425
Misses 58 58 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holding off on merging right away so we can investigate the CI timeouts.
…riksen/raft into wip-move-contractions-tiling-logic
/merge |
…pidsai#837)" This reverts commit c58d00a.
The main functionality of Contractions_NT involves loading tiles of data into shared memory to enable fast GEMM-like kernels. In practice, this requires keeping track of tiles of data (2D submatrices of a bigger matrix) and distributing the data in the tiles over shared memory and registers of thread in a thread block.
Currently, Contractions_NT performs indexing logic for both:
In this PR, we move functionality 2 out of Contractions_NT. Moving over the tiles of data and keeping track of the grid stride loop is now the responsibility of the calling code.
Splitting these responsibilities is helpful when non-trivial tiling logic is required, as in the upcoming sparseL2NN functionality.
Note: This PR also cleans up one unfortunate wart in the current implementation. Depending on which of the two overloaded constructors was called, the tiling logic was transposed leading to extremely difficult to track down bugs.