Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] How to use splitk in cutlass python? #1295

Closed
Miroier opened this issue Jan 9, 2024 · 4 comments
Closed

[QST] How to use splitk in cutlass python? #1295

Miroier opened this issue Jan 9, 2024 · 4 comments
Labels

Comments

@Miroier
Copy link

Miroier commented Jan 9, 2024

What is your question?

Background: A100, nvidia-cutlass 3.3.0.0

When k is much larger than m n , I want to find a cutlass kernel with faster running time than cupy.dot because cupy.dot does not reach the peak performance of A100. My data size is m=n=128,k=16384. I used cutlass profiler to find a potentially faster kernel. But I found that there is no way to specify split_k_mode and split_k_slices in cutlass.op.Gemm.

@jackkosaian
Copy link
Contributor

Good catch. This is currently missing from cutlass.op.Gemm. We will add it soon.

@Miroier
Copy link
Author

Miroier commented Jan 9, 2024

If I want to use splitk in python, is it a feasible way to write the c++ code first, compile it into dynamic libraries and then load the dynamic libraries with python's ctypes? Or do you have any other better suggestion?

@jackkosaian
Copy link
Contributor

jackkosaian commented Jan 9, 2024

My suggestion would be to use tools like Pybind to create Python-C++ bindings for calling into a CUTLASS C++ kernel via Python.

For example, you can take a look at how PyTorch CUDA extensions can be created for a CUTLASS kernel by following this unit test. If you set jit=False here, you can build an offline library called "gemm_mod", by running python setup.py install on the generated setup.py file. You can also inspect the contents of the generated gemm_mod.cu and gemm_mod.cpp to see how the C++-Python bindings are set up.

You could follow a similar pattern by pasting your desired CUTLASS C++ kernel into gemm_mod.cu. You may need to make some slight changes to other parts of the generated files to match your kernels.

@mnicely mnicely added feature request New feature or request python and removed ? - Needs Triage labels Jan 10, 2024
Copy link

github-actions bot commented Feb 9, 2024

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants