Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: (0) == (cuModuleLoadDataEx(&hMod, image.c_str(), sizeof(options) / sizeof(*options), options, values)) INTERNAL ASSERT FAILED #215

Closed
jd730 opened this issue Sep 12, 2023 · 3 comments

Comments

@jd730
Copy link

jd730 commented Sep 12, 2023

Hi,

I installed tutel via python3 -m pip install --user --upgrade git+https://github.com/microsoft/tutel@main
I am running a test script

import torch
from tutel.jit_kernels.gating import fast_cumsum_sub_one

matrix = torch.randint(0, 100, (10000, 100), device='cuda')
cumsum_tutel = fast_cumsum_sub_one(matrix, dim=0) + 1

and facing error

[W custom_kernel.cpp:149] nvrtc: error: invalid value for --gpu-architecture (-arch)
 Failed to use NVRTC for JIT compilation in this Pytorch version, try another approach using CUDA compiler.. (To always disable NVRTC, please: export USE_NVRTC=0)
Traceback (most recent call last):
  File "test.py", line 6, in <module>
    cumsum_tutel = fast_cumsum_sub_one(matrix, dim=0) + 1
  File "/home/jdhwang/.local/lib/python3.8/site-packages/tutel/jit_kernels/gating.py", line 22, in fast_cumsum_sub_one
    return torch.ops.tutel_ops.cumsum(data)
  File "/home/jdhwang/conda/envs/cl/lib/python3.8/site-packages/torch/_ops.py", line 502, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: (0) == (cuModuleLoadDataEx(&hMod, image.c_str(), sizeof(options) / sizeof(*options), options, values)) INTERNAL ASSERT FAILED at "/tmp/pip-req-build-c9h2prbs/tutel/custom/custom_kernel.cpp":205, please report a bug to PyTorch. CHECK_EQ fails.

following #203, I exported export USE_NVRTC=1 and I am using RTX4090 with torch ('2.0.0+cu117') and Cuda 11.7 (nvcc as well).

@ghostplant
Copy link
Contributor

Does anyone of export USE_NVRTC=1 & export USE_NVRTC=0 work? Seems like it is environmental problem (e.g. Multi CUDA version / ..), and it isn't likely to happen if CUDA + Pytorch are in a clean docker container.

@jd730
Copy link
Author

jd730 commented Sep 12, 2023

Hi @ghostplant, Thank you for your fast response. If I set export USE_NVRTC=0, it says

nvcc fatal   : Unsupported gpu architecture 'compute_89'
Traceback (most recent call last):
  File "test.py", line 6, in <module>
    cumsum_tutel = fast_cumsum_sub_one(matrix, dim=0) + 1
  File "/home/jdhwang/.local/lib/python3.8/site-packages/tutel/jit_kernels/gating.py", line 22, in fast_cumsum_sub_one
    return torch.ops.tutel_ops.cumsum(data)
  File "/home/jdhwang/conda/envs/cl/lib/python3.8/site-packages/torch/_ops.py", line 502, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: (true) == (fp != nullptr) INTERNAL ASSERT FAILED at "/tmp/pip-req-build-c9h2prbs/tutel/custom/custom_kernel.cpp":49, please report a bug to PyTorch. CHECK_EQ fails.

I will try to test on clean env and try with cuda11.8 as well.

@jd730
Copy link
Author

jd730 commented Sep 12, 2023

It works after upgrading torch (`2.0.1+cu11.8), nvcc and nccl. Thank you!

@jd730 jd730 closed this as completed Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants