Handle Windows TCC/WDDM mode more robustly #206

leofang · 2024-11-01T03:48:48Z

This issue tracks an internal discussion with QA. This simple snippet shows why using cuda.core today on Windows might fail, depending on if it's TCC or WDDM mode:

>>> from cuda import cuda, cudart
>>> print(cudart.cudaGetDevice())
(<cudaError_t.cudaSuccess: 0>, 0)
>>> print(cuda.cuDeviceGetMemPool(0))
(<CUresult.CUDA_ERROR_NOT_SUPPORTED: 801>, <CUmemoryPool 0x0>)

cuda.core currently assumes the stream-ordered memory allocator is available. However, CUDA on Windows is a bit more complicated than on Linux, since there are two operation modes:

In the WDDM mode (which is the case during cuda.core development), things should work just fine.
In the TCC mode (as reported by QA), this is unsupported

We need some treatments to make it usable on TCC.

The text was updated successfully, but these errors were encountered:

leofang · 2024-11-01T04:25:20Z

xref: https://forums.developer.nvidia.com/t/cudaerrornotsupported-during-cudamallocasync-on-windows-10-based-azure-vm-with-tesla-t4-gpu/251405/2

jrhemstad · 2024-11-01T14:52:50Z

Can't we just say the pool is not available on Windows in TCC mode? I don't think we need to go above and beyond to support something the driver doesn't support.

leofang · 2024-11-01T15:00:20Z

It is not appropriate because CUDA does support Windows TCC mode, just not the mempool. Right now cuda.core is not functional at all only because I forgot (😞) mempools are not there, but we can easily provide a fallback path to make it work (by wrapping cudaMalloc/cudaFree as suggested in #208).

@jrhemstad I suggest us to take this seriously if we want CUDA Mode to succeed, as we have many Windows TCC users in the LLM space, and they all hit this issue (it only took me 1 min to quickly google these):

leofang · 2024-11-12T17:36:58Z

#209 is related (the 3rd step we should take in the future to address this issue).

leofang · 2024-11-25T21:40:18Z

Another reason that TCC is important is because it's the default of GHA Windows GPU runner, e.g.:
https://github.com/cupy-ci-poc/cupy/actions/runs/12004661559/job/33459987560#step:5:15

leofang added cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P0 High priority - Must do! labels Nov 1, 2024

leofang added this to the cuda.core beta 2 milestone Nov 1, 2024

leofang changed the title ~~Handle WIndows TCC/WDDM mode more robustly~~ Handle Windows TCC/WDDM mode more robustly Nov 1, 2024

jollylili removed this from the cuda.core beta 2 milestone Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle Windows TCC/WDDM mode more robustly #206

Handle Windows TCC/WDDM mode more robustly #206

leofang commented Nov 1, 2024 •

edited

Loading

leofang commented Nov 1, 2024

jrhemstad commented Nov 1, 2024

leofang commented Nov 1, 2024

leofang commented Nov 12, 2024

leofang commented Nov 25, 2024 •

edited

Loading

Handle Windows TCC/WDDM mode more robustly #206

Handle Windows TCC/WDDM mode more robustly #206

Comments

leofang commented Nov 1, 2024 • edited Loading

leofang commented Nov 1, 2024

jrhemstad commented Nov 1, 2024

leofang commented Nov 1, 2024

leofang commented Nov 12, 2024

leofang commented Nov 25, 2024 • edited Loading

leofang commented Nov 1, 2024 •

edited

Loading

leofang commented Nov 25, 2024 •

edited

Loading