-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Readd, optimize and profile memcpy_async-based transform kernel for A100 #2361
Labels
cub
For all items related to CUB
Comments
bernhardmgruber
changed the title
Readd, optimize and profile memcpy_async kernel for A100
Readd, optimize and profile memcpy_async-based transform kernel for A100
Sep 4, 2024
19 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The inital PR for
cub::DeviceTransform
#2086 had a dedicated kernel usingcg::memcpy_async
for A100. This kernel was removed during the review, to make way for the performance validated H100 kernel. We should properly validate thecg::memcpy_async
kernel performance and bring it back.The text was updated successfully, but these errors were encountered: