Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Value error when importing cudf.pandas #16016

Closed
blue-cat-whale opened this issue Jun 13, 2024 · 7 comments · Fixed by #16067
Closed

[QST] Value error when importing cudf.pandas #16016

blue-cat-whale opened this issue Jun 13, 2024 · 7 comments · Fixed by #16067
Labels
bug Something isn't working Python Affects Python cuDF API. question Further information is requested

Comments

@blue-cat-whale
Copy link

Problem:

(cudf) [root@localhost test_cuda]# python3
Python 3.11.5 (main, Sep 22 2023, 15:34:29) [GCC 8.5.0 20210514 (Red Hat 8.5.0-20)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cudf.pandas
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/share/.virtualenvs/cudf/lib64/python3.11/site-packages/cudf/__init__.py", line 9, in <module>
    _setup_numba()
  File "/usr/local/share/.virtualenvs/cudf/lib64/python3.11/site-packages/cudf/utils/_numba.py", line 120, in _setup_numba
    versions = safe_get_versions()
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/share/.virtualenvs/cudf/lib64/python3.11/site-packages/cudf/utils/_ptxcompiler.py", line 106, in safe_get_versions
    driver_version, runtime_version = get_versions()
                                      ^^^^^^^^^^^^^^
  File "/usr/local/share/.virtualenvs/cudf/lib64/python3.11/site-packages/cudf/utils/_ptxcompiler.py", line 64, in get_versions
    versions = [int(s) for s in cp.stdout.strip().split()]
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/share/.virtualenvs/cudf/lib64/python3.11/site-packages/cudf/utils/_ptxcompiler.py", line 64, in <listcomp>
    versions = [int(s) for s in cp.stdout.strip().split()]
                ^^^^^^
ValueError: invalid literal for int() with base 10: b'2024-06-13'

The package is installed by pip install --extra-index-url=https://pypi.nvidia.com cudf-cu12==24.6.*

(cudf) [root@localhost test_cuda]# pip show cudf-cu12
Name: cudf-cu12
Version: 24.6.0
Summary: cuDF - GPU Dataframe
Home-page:
Author: NVIDIA Corporation
Author-email:
License: Apache 2.0
Location: /usr/local/share/.virtualenvs/cudf/lib64/python3.11/site-packages
Requires: cachetools, cuda-python, cupy-cuda12x, fsspec, numba, numpy, nvtx, packaging, pandas, pyarrow, pynvjitlink-cu12, rich, rmm-cu12, typing_extensions
Required-by:

Dependency:

cloudpickle==3.0.0
cuda-python==12.5.0
cudf-cu12==24.6.0
cupy-cuda12x==13.2.0
dask==2024.4.1
dask-expr==1.0.11
dill==0.3.8
fastrlock==0.8.2
filelock==3.13.4
fsspec==2024.6.0
idna==3.7
importlib_metadata==7.1.0
Jinja2==3.1.3
joblib==1.4.0
llvmlite==0.42.0
locket==1.0.0
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mpmath==1.3.0
multiprocess==0.70.16
networkx==3.3
numba==0.59.1
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.1.105
nvtx==0.2.10
packaging==24.1
pandas==2.2.2
partd==1.4.1
patsy==0.5.6
protobuf==4.25.3
psutil==5.9.8
pyarrow==16.1.0
Pygments==2.18.0
pynvjitlink-cu12==0.2.4
python-dateutil==2.9.0.post0
pytz==2024.1
PyYAML==6.0.1
requests==2.31.0
rich==13.7.1
rmm-cu12==24.6.0
scikit-learn==1.4.0
scipy==1.13.0
six==1.16.0
statsmodels==0.14.2
swifter==1.4.0
sympy==1.12
TA-Lib==0.4.28
threadpoolctl==3.4.0
toolz==0.12.1
torch==2.2.2
tqdm==4.66.2
triton==2.2.0
typing_extensions==4.12.2
tzdata==2024.1
urllib3==2.2.1
zipp==3.18.1
@blue-cat-whale blue-cat-whale added the question Further information is requested label Jun 13, 2024
@brandon-b-miller
Copy link
Contributor

Hi @blue-cat-whale ,
Can you run the following code in a python interpreter and paste the output?

from ctypes import c_int, byref
from numba import cuda
dv = c_int(0)
cuda.cudadrv.driver.driver.cuDriverGetVersion(byref(dv))
drv_major = dv.value // 1000
drv_minor = (dv.value - (drv_major * 1000)) // 10
run_major, run_minor = cuda.runtime.get_version()
print(f'{drv_major} {drv_minor} {run_major} {run_minor}')

@Matt711 Matt711 added the 0 - Waiting on Author Waiting for author to respond to review label Jun 13, 2024
@blue-cat-whale
Copy link
Author

blue-cat-whale commented Jun 14, 2024

Hi @blue-cat-whale , Can you run the following code in a python interpreter and paste the output?

from ctypes import c_int, byref
from numba import cuda
dv = c_int(0)
cuda.cudadrv.driver.driver.cuDriverGetVersion(byref(dv))
drv_major = dv.value // 1000
drv_minor = (dv.value - (drv_major * 1000)) // 10
run_major, run_minor = cuda.runtime.get_version()
print(f'{drv_major} {drv_minor} {run_major} {run_minor}')
(cudf) [root@localhost nn]# python3
Python 3.11.5 (main, Sep 22 2023, 15:34:29) [GCC 8.5.0 20210514 (Red Hat 8.5.0-20)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from ctypes import c_int, byref
v.driver.driver.cuDriverGetVersion(byref(dv))
drv_major = dv.value // 1000
drv_minor = (dv.value - (drv_major * 1000)) // 10
run_major, run_minor = cuda.runtime.get_version()
print(f'{drv_major} {drv_minor} {run_major} {run_minor}')>>> from numba import cuda
>>> dv = c_int(0)
>>> cuda.cudadrv.driver.driver.cuDriverGetVersion(byref(dv))
>>> drv_major = dv.value // 1000
>>> drv_minor = (dv.value - (drv_major * 1000)) // 10
>>> run_major, run_minor = cuda.runtime.get_version()
>>> print(f'{drv_major} {drv_minor} {run_major} {run_minor}')
12 0 12 1
>>>

@brandon-b-miller brandon-b-miller removed the 0 - Waiting on Author Waiting for author to respond to review label Jun 14, 2024
@brandon-b-miller
Copy link
Contributor

Thanks, this is helpful. Something strange is happening while numba is attempting to check the versions of cuda on your system. I would have expected the above command to show some more useful output, but it looks like we need to debug a bit deeper to reproduce the issue.

Would you be able to run this small python script in the failing environment and paste the output?

import sys
import subprocess

NUMBA_CHECK_VERSION_CMD = """\
from ctypes import c_int, byref
from numba import cuda
dv = c_int(0)
cuda.cudadrv.driver.driver.cuDriverGetVersion(byref(dv))
drv_major = dv.value // 1000
drv_minor = (dv.value - (drv_major * 1000)) // 10
run_major, run_minor = cuda.runtime.get_version()
print(f'{drv_major} {drv_minor} {run_major} {run_minor}')
"""

cp = subprocess.run(
    [sys.executable, "-c", NUMBA_CHECK_VERSION_CMD], capture_output=True
)
print(cp.stdout)

@brandon-b-miller brandon-b-miller added the 0 - Waiting on Author Waiting for author to respond to review label Jun 17, 2024
@blue-cat-whale
Copy link
Author

blue-cat-whale commented Jun 18, 2024

Thanks, this is helpful. Something strange is happening while numba is attempting to check the versions of cuda on your system. I would have expected the above command to show some more useful output, but it looks like we need to debug a bit deeper to reproduce the issue.

Would you be able to run this small python script in the failing environment and paste the output?

import sys
import subprocess

NUMBA_CHECK_VERSION_CMD = """\
from ctypes import c_int, byref
from numba import cuda
dv = c_int(0)
cuda.cudadrv.driver.driver.cuDriverGetVersion(byref(dv))
drv_major = dv.value // 1000
drv_minor = (dv.value - (drv_major * 1000)) // 10
run_major, run_minor = cuda.runtime.get_version()
print(f'{drv_major} {drv_minor} {run_major} {run_minor}')
"""

cp = subprocess.run(
    [sys.executable, "-c", NUMBA_CHECK_VERSION_CMD], capture_output=True
)
print(cp.stdout)
(cudf) [root@localhost test_cuda]# python tmp_0618.py
b'12 0 12 1\n2024-06-19 10:29:28 [DEBUG] After get lib arch_name=cuda lib_name=liborion_client_common.so, version=0, file_path=/root/.orion/lib/cuda4002000/liborion_client_common.so, ret=0.\n2024-06-19 10:29:28 [DEBUG] After get lib arch_name=cuda lib_name=cuda, version=0, file_path=/root/.orion/lib/cuda4002000/libcuda.so, ret=0.\n2024-06-19 10:29:28 [DEBUG] Using group resource 7cbca860-2ac2-4407-9c32-e4503043d0e5\n2024-06-19 10:29:28 [DEBUG] System initialization begin.\n2024-06-19 10:29:28 [DEBUG] Getting Orion resource ...\n2024-06-19 10:29:28 [DEBUG] Checking Unix socket at /var/tmp/orion/comm/orion.sock\n2024-06-19 10:29:28 [DEBUG] Requesting resource through /var/tmp/orion/comm/orion.sock\n2024-06-19 10:29:28 [INFO] Using Orion resource (7cbca860-2ac2-4407-9c32-e4503043d0e5) b103dad8-2472-42e9-bf13-0c95a1aa5b49 : 10.20.154.1:9960/1/1/12000/GPU-00000000-0000-000a-149a-0126e8010100,Allocation_id:b103dad8-2472-42e9-bf13-0c95a1aa5b49\n2024-06-19 10:29:28 [DEBUG] Architecture 66 initialization begin.\n2024-06-19 10:29:28 [INFO] \x1b[33mClient get resource list : 10.20.154.1:9960/1/1/12000/GPU-00000000-0000-000a-149a-0126e8010100,Allocation_id:b103dad8-2472-42e9-bf13-0c95a1aa5b49\x1b[0m\n2024-06-19 10:29:28 [INFO] \x1b[33mRPC mode. Because env ORION_ENABLE_LPC is not 1.\x1b[0m\n2024-06-19 10:29:28 [DEBUG] Skip orionrun initialization.\n2024-06-19 10:29:28 [DEBUG] System initialization is done.\n2024-06-19 10:29:28 [INFO] Releasing Orion resource ...\n'

@brandon-b-miller brandon-b-miller removed the 0 - Waiting on Author Waiting for author to respond to review label Jun 18, 2024
@brandon-b-miller
Copy link
Contributor

Thanks @blue-cat-whale . I'm still not sure what the issue is yet. I'll need a little time to dig into this, until I have a better answer, can you try setting the following three environment variables as a workaround, and let me know if you're able to import cudf.pandas afterwards?

export PTXCOMPILER_CHECK_NUMBA_CODEGEN_PATCH_NEEDED=0
export PTXCOMPILER_KNOWN_DRIVER_VERSION=12.0
export PTXCOMPILER_KNOWN_RUNTIME_VERSION=12.1

@brandon-b-miller brandon-b-miller added the 0 - Waiting on Author Waiting for author to respond to review label Jun 18, 2024
@blue-cat-whale
Copy link
Author

Thanks @blue-cat-whale . I'm still not sure what the issue is yet. I'll need a little time to dig into this, until I have a better answer, can you try setting the following three environment variables as a workaround, and let me know if you're able to import cudf.pandas afterwards?

export PTXCOMPILER_CHECK_NUMBA_CODEGEN_PATCH_NEEDED=0
export PTXCOMPILER_KNOWN_DRIVER_VERSION=12.0
export PTXCOMPILER_KNOWN_RUNTIME_VERSION=12.1
(cudf) [wangyu@localhost test_cuda]$ python tmp_0618.py
b'12 0 12 1\n2024-06-19 10:31:35 [DEBUG] After get lib arch_name=cuda lib_name=liborion_client_common.so, version=0, file_path=/home/wangyu/.orion/lib/cuda4002000/liborion_client_common.so, ret=0.\n2024-06-19 10:31:35 [DEBUG] After get lib arch_name=cuda lib_name=cuda, version=0, file_path=/home/wangyu/.orion/lib/cuda4002000/libcuda.so, ret=0.\n2024-06-19 10:31:35 [DEBUG] Using group resource fb5f4dab-87de-4105-b01d-fd1d10a9cc69\n2024-06-19 10:31:35 [DEBUG] System initialization begin.\n2024-06-19 10:31:35 [DEBUG] Getting Orion resource ...\n2024-06-19 10:31:35 [DEBUG] Checking Unix socket at /var/tmp/orion/comm/orion.sock\n2024-06-19 10:31:35 [DEBUG] Requesting resource through /var/tmp/orion/comm/orion.sock\n2024-06-19 10:31:35 [INFO] Using Orion resource (fb5f4dab-87de-4105-b01d-fd1d10a9cc69) 8040c557-aae9-428b-a221-9ed3145a909a : 10.20.154.1:9960/1/1/12000/GPU-00000000-0000-000a-149a-0126e8010100,Allocation_id:8040c557-aae9-428b-a221-9ed3145a909a\n2024-06-19 10:31:35 [DEBUG] Architecture 66 initialization begin.\n2024-06-19 10:31:35 [INFO] \x1b[33mClient get resource list : 10.20.154.1:9960/1/1/12000/GPU-00000000-0000-000a-149a-0126e8010100,Allocation_id:8040c557-aae9-428b-a221-9ed3145a909a\x1b[0m\n2024-06-19 10:31:35 [INFO] \x1b[33mRPC mode. Because env ORION_ENABLE_LPC is not 1.\x1b[0m\n2024-06-19 10:31:35 [DEBUG] Skip orionrun initialization.\n2024-06-19 10:31:35 [DEBUG] System initialization is done.\n2024-06-19 10:31:35 [INFO] Releasing Orion resource ...\n'

ps. The output in the previous post is also updated. I made a mistake in the old one.

@brandon-b-miller brandon-b-miller removed the 0 - Waiting on Author Waiting for author to respond to review label Jun 24, 2024
@brandon-b-miller
Copy link
Contributor

Ok - this makes sense. I think it's fair to treat this is a bug because the way that cuDF parses the cuda versions doesn't account for the possibility of additional stdout and stderr output, and this could be trimmed. I'll put in a PR for this.

As a temporary workaround, I believe the three environment variables above should allow cuDF to import successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API. question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants