Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Windows TCC/WDDM mode more robustly #206

Open
leofang opened this issue Nov 1, 2024 · 5 comments
Open

Handle Windows TCC/WDDM mode more robustly #206

leofang opened this issue Nov 1, 2024 · 5 comments
Labels
cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P0 High priority - Must do!

Comments

@leofang
Copy link
Member

leofang commented Nov 1, 2024

This issue tracks an internal discussion with QA. This simple snippet shows why using cuda.core today on Windows might fail, depending on if it's TCC or WDDM mode:

>>> from cuda import cuda, cudart
>>> print(cudart.cudaGetDevice())
(<cudaError_t.cudaSuccess: 0>, 0)
>>> print(cuda.cuDeviceGetMemPool(0))
(<CUresult.CUDA_ERROR_NOT_SUPPORTED: 801>, <CUmemoryPool 0x0>)

cuda.core currently assumes the stream-ordered memory allocator is available. However, CUDA on Windows is a bit more complicated than on Linux, since there are two operation modes:

  • In the WDDM mode (which is the case during cuda.core development), things should work just fine.
  • In the TCC mode (as reported by QA), this is unsupported

We need some treatments to make it usable on TCC.

@leofang leofang added cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P0 High priority - Must do! labels Nov 1, 2024
@leofang leofang added this to the cuda.core beta 2 milestone Nov 1, 2024
@leofang leofang changed the title Handle WIndows TCC/WDDM mode more robustly Handle Windows TCC/WDDM mode more robustly Nov 1, 2024
@jrhemstad
Copy link

Can't we just say the pool is not available on Windows in TCC mode? I don't think we need to go above and beyond to support something the driver doesn't support.

@leofang
Copy link
Member Author

leofang commented Nov 1, 2024

It is not appropriate because CUDA does support Windows TCC mode, just not the mempool. Right now cuda.core is not functional at all only because I forgot (😞) mempools are not there, but we can easily provide a fallback path to make it work (by wrapping cudaMalloc/cudaFree as suggested in #208).

@jrhemstad I suggest us to take this seriously if we want CUDA Mode to succeed, as we have many Windows TCC users in the LLM space, and they all hit this issue (it only took me 1 min to quickly google these):

@leofang
Copy link
Member Author

leofang commented Nov 12, 2024

#209 is related (the 3rd step we should take in the future to address this issue).

@jollylili jollylili removed this from the cuda.core beta 2 milestone Nov 15, 2024
@leofang
Copy link
Member Author

leofang commented Nov 25, 2024

Another reason that TCC is important is because it's the default of GHA Windows GPU runner, e.g.:
https://github.com/cupy-ci-poc/cupy/actions/runs/12004661559/job/33459987560#step:5:15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P0 High priority - Must do!
Projects
None yet
Development

No branches or pull requests

3 participants