-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Fix] Support threads_per_head < 64 for wavefront size of 64 #6622
Conversation
When launching apply_rotary_pos_half kernel, only threads_per_head of 64 is supported for wavefront size of 64. This change adds support for threads_per_head < 64 such as 4, 8, 16. Remove the condition to check ROCm and wavefront size check. Signed-off-by: Jagadish Krishnamoorthy <[email protected]>
@loadams any comments on this PR? |
Signed-off-by: Jagadish Krishnamoorthy <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but let's add a unit test to ensure this functionality can be tested on ROCm (and CUDA)
@jagadish-amd - thoughts on adding unit tests for this? |
I will add the unit tests. Thanks |
Signed-off-by: Jagadish Krishnamoorthy <[email protected]>
@loadams I have added the test case to test the threads_per_head ,warp size alignment issue. These are the results. The test run is aborted (as expected) due to the error in kernel. Not sure if there is better way to handle this? |
@jagadish-amd - this should be fine for now. I believe the only remaining thing for this PR is the CLA agreement, you should just need to reply to it with accept and company as AMD. |
@microsoft-github-policy-service agree company="AMD" |
When launching apply_rotary_pos_half kernel, only threads_per_head of 64 is supported for wavefront size of 64.
This change adds support for threads_per_head < 64 such as 4, 8, 16.
Fixes the issue introduced in #5402