Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuSOLVER error encountered when running rsc.pp.pca(adata, n_comps=50) #6182

Open
hyjforesight opened this issue Dec 15, 2024 · 4 comments
Open
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@hyjforesight
Copy link

hyjforesight commented Dec 15, 2024

Describe the bug
Hello Rapids,
Thank you for developing this amazing pipeline.
I met cuSOLVER error encountered when running rapids-singlecell. Please see below or scverse/rapids_singlecell#307.
The developer of rapids-singlecell told me it was the issue of cuml.
Could you please help me with this issue?
Thank you!
Best,
YJ

Steps/Code to reproduce bug

rsc.pp.regress_out(adata, keys=['total_counts', 'pct_counts_mt','pct_counts_rpl','pct_counts_rps'])
rsc.pp.scale(adata, max_value=10)
adata
AnnData object with n_obs × n_vars = 934583 × 5000
    obs: 'batch', 'type', 'more_type', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_rpl', 'pct_counts_rpl', 'total_counts_rps', 'pct_counts_rps'
    var: 'mt', 'rpl', 'rps', 'n_cells_by_counts', 'total_counts', 'mean_counts', 'pct_dropout_by_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
    uns: 'log1p', 'hvg'
    layers: 'counts'

rsc.pp.pca(adata, n_comps=50)
sc.pl.pca_variance_ratio(adata, log=True, n_pcs=100)
adata
RuntimeError                              Traceback (most recent call last)
File <timed exec>:1

File /environment/miniconda3/lib/python3.11/site-packages/rapids_singlecell/preprocessing/_pca.py:174, in pca(***failed resolving arguments***)
    167     else:
    168         pca_func = PCA(
    169             n_components=n_comps,
    170             svd_solver=svd_solver,
    171             random_state=random_state,
    172             output_type="numpy",
    173         )
--> 174         X_pca = pca_func.fit_transform(X)
    176 elif not zero_center:
    177     pca_func = TruncatedSVD(
    178         n_components=n_comps,
    179         random_state=random_state,
    180         algorithm=svd_solver,
    181         output_type="numpy",
    182     )

File /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/api_decorators.py:188, in _make_decorator_function.<locals>.decorator_function.<locals>.decorator_closure.<locals>.wrapper(*args, **kwargs)
    185     set_api_output_dtype(output_dtype)
    187 if process_return:
--> 188     ret = func(*args, **kwargs)
    189 else:
    190     return func(*args, **kwargs)

File /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/api_decorators.py:393, in enable_device_interop.<locals>.dispatch(self, *args, **kwargs)
    391 if hasattr(self, "dispatch_func"):
    392     func_name = gpu_func.__name__
--> 393     return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
    394 else:
    395     return gpu_func(self, *args, **kwargs)

File /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/api_decorators.py:190, in _make_decorator_function.<locals>.decorator_function.<locals>.decorator_closure.<locals>.wrapper(*args, **kwargs)
    188         ret = func(*args, **kwargs)
    189     else:
--> 190         return func(*args, **kwargs)
    192 return cm.process_return(ret)

File base.pyx:687, in cuml.internals.base.UniversalBase.dispatch_func()

File pca.pyx:510, in cuml.decomposition.pca.PCA.fit_transform()

File /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/api_decorators.py:188, in _make_decorator_function.<locals>.decorator_function.<locals>.decorator_closure.<locals>.wrapper(*args, **kwargs)
    185     set_api_output_dtype(output_dtype)
    187 if process_return:
--> 188     ret = func(*args, **kwargs)
    189 else:
    190     return func(*args, **kwargs)

File /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/api_decorators.py:393, in enable_device_interop.<locals>.dispatch(self, *args, **kwargs)
    391 if hasattr(self, "dispatch_func"):
    392     func_name = gpu_func.__name__
--> 393     return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
    394 else:
    395     return gpu_func(self, *args, **kwargs)

File /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/api_decorators.py:190, in _make_decorator_function.<locals>.decorator_function.<locals>.decorator_closure.<locals>.wrapper(*args, **kwargs)
    188         ret = func(*args, **kwargs)
    189     else:
--> 190         return func(*args, **kwargs)
    192 return cm.process_return(ret)

File base.pyx:687, in cuml.internals.base.UniversalBase.dispatch_func()

File pca.pyx:481, in cuml.decomposition.pca.PCA.fit()

RuntimeError: cuSOLVER error encountered at: file=/__w/cuml/cuml/python/cuml/build/cp311-cp311-linux_x86_64/_deps/raft-src/cpp/include/raft/linalg/detail/eig.cuh line=136: call='cusolverDnxsyevd(cusolverH, dn_params, CUSOLVER_EIG_MODE_VECTOR, CUBLAS_FILL_MODE_UPPER, static_cast<int64_t>(n_rows), eig_vectors, static_cast<int64_t>(n_cols), eig_vals, d_work.data(), workspaceDevice, h_work.data(), workspaceHost, d_dev_info.data(), stream_new)', Reason=7:CUSOLVER_STATUS_INTERNAL_ERROR
Obtained 63 stack frames
#1 in /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/../libcuml++.so: raft::cusolver_error::cusolver_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) +0x5a [0x7fdc01c5658a]
#2 in /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/../libcuml++.so: void raft::linalg::detail::eigDC<double>(raft::resources const&, double const*, unsigned long, unsigned long, double*, double*, CUstream_st*) +0x1259 [0x7fdc023cb3b9]
#3 in /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/../libcuml++.so: void ML::truncCompExpVars<double, ML::solver>(raft::handle_t const&, double*, double*, double*, double*, ML::paramsTSVDTemplate<ML::solver> const&, CUstream_st*) +0x739 [0x7fdc027d2529]
#4 in /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/../libcuml++.so(+0x11c6a7e) [0x7fdc027c1a7e]
#5 in /environment/miniconda3/lib/python3.11/site-packages/cuml/decomposition/pca.cpython-311-x86_64-linux-gnu.so(+0x430fc) [0x7fdbf93b40fc]
#6 in /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/base.cpython-311-x86_64-linux-gnu.so(+0x1009e) [0x7fdbfa2b409e]
#7 in /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/base.cpython-311-x86_64-linux-gnu.so(+0x1c396) [0x7fdbfa2c0396]
#8 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x4869 [0x515419]
#9 in /environment/miniconda3/bin/python() [0x557098]
#10 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x4869 [0x515419]
#11 in /environment/miniconda3/bin/python: _PyFunction_Vectorcall +0x173 [0x538903]
#12 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x4869 [0x515419]
#13 in /environment/miniconda3/bin/python: _PyFunction_Vectorcall +0x173 [0x538903]
#14 in /environment/miniconda3/lib/python3.11/site-packages/cuml/decomposition/pca.cpython-311-x86_64-linux-gnu.so(+0x40925) [0x7fdbf93b1925]
#15 in /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/base.cpython-311-x86_64-linux-gnu.so(+0x1009e) [0x7fdbfa2b409e]
#16 in /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/base.cpython-311-x86_64-linux-gnu.so(+0x1c396) [0x7fdbfa2c0396]
#17 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x4869 [0x515419]
#18 in /environment/miniconda3/bin/python() [0x557098]
#19 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x4869 [0x515419]
#20 in /environment/miniconda3/bin/python: _PyFunction_Vectorcall +0x173 [0x538903]
#21 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x4869 [0x515419]
#22 in /environment/miniconda3/bin/python() [0x5cb78a]
#23 in /environment/miniconda3/bin/python: PyEval_EvalCode +0x9f [0x5cae5f]
#24 in /environment/miniconda3/bin/python() [0x5e45e3]
#25 in /environment/miniconda3/bin/python() [0x51e3d7]
#26 in /environment/miniconda3/bin/python: PyObject_Vectorcall +0x31 [0x51e2c1]
#27 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x6a6 [0x511256]
#28 in /environment/miniconda3/bin/python() [0x55799f]
#29 in /environment/miniconda3/bin/python() [0x55718e]
#30 in /environment/miniconda3/bin/python: PyObject_Call +0x12c [0x54288c]
#31 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x4869 [0x515419]
#32 in /environment/miniconda3/bin/python() [0x5cb78a]
#33 in /environment/miniconda3/bin/python: PyEval_EvalCode +0x9f [0x5cae5f]
#34 in /environment/miniconda3/bin/python() [0x5e45e3]
#35 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x3738 [0x5142e8]
#36 in /environment/miniconda3/bin/python() [0x5e001a]
#37 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x321f [0x513dcf]
#38 in /environment/miniconda3/bin/python() [0x5e001a]
#39 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x321f [0x513dcf]
#40 in /environment/miniconda3/bin/python() [0x5e001a]
#41 in /environment/miniconda3/bin/python() [0x5e2656]
#42 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x38ba [0x51446a]
#43 in /environment/miniconda3/bin/python() [0x55799f]
#44 in /environment/miniconda3/bin/python() [0x55718e]
#45 in /environment/miniconda3/bin/python: PyObject_Call +0x12c [0x54288c]
#46 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x4869 [0x515419]
#47 in /environment/miniconda3/bin/python() [0x5e001a]
#48 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x321f [0x513dcf]
#49 in /environment/miniconda3/bin/python() [0x5e001a]
#50 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x321f [0x513dcf]
#51 in /environment/miniconda3/bin/python() [0x5e001a]
#52 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x321f [0x513dcf]
#53 in /environment/miniconda3/bin/python() [0x5e001a]
#54 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x321f [0x513dcf]
#55 in /environment/miniconda3/bin/python() [0x5e001a]
#56 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x321f [0x513dcf]
#57 in /environment/miniconda3/bin/python() [0x5e001a]
#58 in /environment/miniconda3/lib/python3.11/lib-dynload/_asyncio.cpython-311-x86_64-linux-gnu.so(+0x79fb) [0x7fdf0908c9fb]
#59 in /environment/miniconda3/bin/python() [0x52657b]
#60 in /environment/miniconda3/bin/python() [0x4c6caf]
#61 in /environment/miniconda3/bin/python() [0x4cbd10]
#62 in /environment/miniconda3/bin/python() [0x51e3d7]
#63 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x928f [0x519e3f]

Expected behavior
Run smoothly.

Environment details (please complete the following information):

  • Environment location: [Cloud (featurize.cn)]
  • Linux Distro/Architecture: [Ubuntu 22.04.4 LTS]
  • GPU Model/Driver: [RTX4090 X8 and driver 560.35.03]
  • CUDA: [12.3]
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:17:15_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0
  • Method of cuDF & cuML install: [pip]
pip install 'rapids-singlecell[rapids12]' --extra-index-url=https://pypi.nvidia.com #CUDA12
pip install rapids-singlecell
 Package                      Version
---------------------------- --------------
absl-py                      2.1.0
aiohttp                      3.7.4
anaconda-anon-usage          0.4.4
anndata                      0.11.1
anyio                        3.7.1
archspec                     0.2.3
argon2-cffi                  23.1.0
argon2-cffi-bindings         21.2.0
array_api_compat             1.9.1
arrow                        1.3.0
asttokens                    2.4.1
astunparse                   1.6.3
async-lru                    2.0.4
async-timeout                3.0.1
attrs                        23.2.0
Babel                        2.14.0
beautifulsoup4               4.12.3
bleach                       6.1.0
boltons                      23.0.0
Brotli                       1.0.9
cachetools                   5.5.0
certifi                      2024.2.2
cffi                         1.16.0
chardet                      3.0.4
charset-normalizer           2.0.4
click                        8.1.7
cloudpickle                  3.1.0
comm                         0.2.2
conda                        24.3.0
conda-content-trust          0.2.0
conda-libmamba-solver        24.1.0
conda-package-handling       2.2.0
conda_package_streaming      0.9.0
contourpy                    1.2.1
cryptography                 42.0.5
cuda-python                  12.6.2.post1
cudf-cu12                    24.10.1
cugraph-cu12                 24.10.0
cuml-cu12                    24.10.0
cupy-cuda12x                 13.3.0
cuvs-cu12                    24.10.0
cycler                       0.12.1
dask                         2024.9.0
dask-cuda                    24.10.0
dask-cudf-cu12               24.10.1
dask-expr                    1.1.14
debugpy                      1.8.1
decorator                    5.1.1
defusedxml                   0.7.1
distributed                  2024.9.0
distributed-ucxx-cu12        0.40.0
distro                       1.8.0
ecdsa                        0.19.0
executing                    2.0.1
fastjsonschema               2.19.1
fastrlock                    0.8.2
filelock                     3.13.4
flatbuffers                  24.3.25
fonttools                    4.53.0
fqdn                         1.5.1
fsspec                       2024.3.1
gast                         0.5.4
google-pasta                 0.2.0
grpcio                       1.62.2
h11                          0.14.0
h5py                         3.11.0
httpcore                     1.0.5
httpx                        0.27.0
idna                         3.4
igraph                       0.11.8
imageio                      2.36.1
importlib_metadata           8.5.0
ipykernel                    6.29.4
ipython                      8.23.0
ipython-genutils             0.2.0
isoduration                  20.11.0
jedi                         0.19.1
Jinja2                       3.1.3
joblib                       1.4.2
json5                        0.9.25
jsonpatch                    1.33
jsonpointer                  2.1
jsonschema                   4.21.1
jsonschema-specifications    2023.12.1
jupyter_client               8.6.1
jupyter_core                 5.7.2
jupyter-events               0.10.0
jupyter-lsp                  2.2.5
jupyter_server               2.14.0
jupyter_server_terminals     0.5.3
jupyterlab                   4.2.0
jupyterlab_pygments          0.3.0
jupyterlab_server            2.27.1
keras                        3.3.2
kiwisolver                   1.4.5
lazy_loader                  0.4
legacy-api-wrap              1.4.1
leidenalg                    0.10.2
libclang                     18.1.1
libcudf-cu12                 24.10.1
libmambapy                   1.5.8
libucx-cu12                  1.17.0.post1
libucxx-cu12                 0.40.0
llvmlite                     0.43.0
locket                       1.0.0
Markdown                     3.6
markdown-it-py               3.0.0
MarkupSafe                   2.1.5
matplotlib                   3.9.0
matplotlib-inline            0.1.7
mdurl                        0.1.2
menuinst                     2.0.2
mistune                      3.0.2
ml-dtypes                    0.3.2
mpmath                       1.3.0
msgpack                      1.1.0
multidict                    6.0.5
namex                        0.0.8
natsort                      8.4.0
nbclassic                    0.2.8
nbclient                     0.10.0
nbconvert                    7.16.3
nbformat                     5.10.4
nest-asyncio                 1.6.0
networkx                     3.3
notebook                     6.4.13
notebook_shim                0.2.4
numba                        0.60.0
numpy                        1.26.4
nvidia-cublas-cu12           12.1.3.1
nvidia-cuda-cupti-cu12       12.1.105
nvidia-cuda-nvrtc-cu12       12.1.105
nvidia-cuda-runtime-cu12     12.1.105
nvidia-cudnn-cu12            8.9.2.26
nvidia-cufft-cu12            11.0.2.54
nvidia-curand-cu12           10.3.2.106
nvidia-cusolver-cu12         11.4.5.107
nvidia-cusparse-cu12         12.1.0.106
nvidia-nccl-cu12             2.19.3
nvidia-nvjitlink-cu12        12.4.127
nvidia-nvtx-cu12             12.1.105
nvtx                         0.2.10
opt-einsum                   3.3.0
optree                       0.11.0
overrides                    7.7.0
packaging                    23.2
pandas                       2.2.2
pandocfilters                1.5.1
parso                        0.8.4
partd                        1.4.2
patsy                        1.0.1
pexpect                      4.9.0
pillow                       10.3.0
pip                          23.3.1
platformdirs                 3.10.0
pluggy                       1.0.0
prometheus_client            0.20.0
prompt-toolkit               3.0.43
protobuf                     4.25.3
psutil                       5.9.8
ptyprocess                   0.7.0
pure-eval                    0.2.2
pyarrow                      17.0.0
pycosat                      0.6.6
pycparser                    2.21
Pygments                     2.17.2
pylibcudf-cu12               24.10.1
pylibcugraph-cu12            24.10.0
pylibraft-cu12               24.10.0
pynndescent                  0.5.13
pynvjitlink-cu12             0.4.0
pynvml                       11.4.1
pyparsing                    3.1.2
PySocks                      1.7.1
python-dateutil              2.9.0.post0
python-json-logger           2.0.7
pytz                         2024.1
PyYAML                       6.0.1
pyzmq                        26.0.2
raft-dask-cu12               24.10.0
rapids-dask-dependency       24.10.0
rapids_singlecell            0.10.11
referencing                  0.35.0
requests                     2.31.0
rfc3339-validator            0.1.4
rfc3986-validator            0.1.1
rich                         13.7.1
rmm-cu12                     24.10.0
rpds-py                      0.18.0
ruamel.yaml                  0.17.21
scanpy                       1.10.4
scikit-image                 0.24.0
scikit-learn                 1.6.0
scikit-misc                  0.5.1
scipy                        1.14.1
seaborn                      0.13.2
Send2Trash                   1.8.3
session-info                 1.0.0
setuptools                   68.2.2
six                          1.16.0
sniffio                      1.3.1
sortedcontainers             2.4.0
soupsieve                    2.5
sshpubkeys                   3.3.1
stack-data                   0.6.3
statsmodels                  0.14.4
stdlib-list                  0.11.0
sympy                        1.12
tblib                        3.0.0
tensorboard                  2.16.2
tensorboard-data-server      0.7.2
tensorflow                   2.16.1
tensorflow-io-gcs-filesystem 0.36.0
termcolor                    2.4.0
terminado                    0.18.1
texttable                    1.7.0
threadpoolctl                3.5.0
tifffile                     2024.9.20
tinycss2                     1.3.0
toolz                        1.0.0
torch                        2.2.2
torchaudio                   2.2.2
torchvision                  0.17.2
tornado                      6.4
tqdm                         4.65.0
traitlets                    5.14.3
treelite                     4.3.0
triton                       2.2.0
truststore                   0.8.0
types-python-dateutil        2.9.0.20240316
typing_extensions            4.11.0
tzdata                       2024.1
ucx-py-cu12                  0.40.0
ucxx-cu12                    0.40.0
umap-learn                   0.5.7
uri-template                 1.3.0
urllib3                      2.1.0
wcwidth                      0.2.13
webcolors                    1.13
webencodings                 0.5.1
websocket-client             1.8.0
Werkzeug                     3.0.2
wheel                        0.41.2
wrapt                        1.16.0
yarl                         1.9.4
zict                         3.0.0
zipp                         3.21.0
zstandard                    0.19.0

Additional context
Add any other context about the problem here.

@hyjforesight hyjforesight added ? - Needs Triage Need team to review and classify bug Something isn't working labels Dec 15, 2024
@dantegd
Copy link
Member

dantegd commented Dec 17, 2024

Thanks for the issue @hyjforesight, indeed it seems to be coming from pca specifically from cuML.

Sorry if I missed it, but any chance you could provide some additional info about the dataset you are processing?

@hyjforesight
Copy link
Author

hyjforesight commented Dec 17, 2024

Hello @dantegd
Thank you for the response.
I'm processing a single-cell RNA-sequencing (scRNA-seq) dataset. These data were collected from public database and simply merged by Scanpy. Then I used rapids-singlecell to do some pre-processing as below:

import numpy as np
import pandas as pd
import scanpy as sc
import scanpy.external as sce
import scipy
sc.settings.verbosity = 3
sc.logging.print_header()
sc.set_figure_params(dpi=100, dpi_save=600)

import matplotlib.pyplot as pl
from matplotlib import rcParams

import os

import cupy as cp
import rapids_singlecell as rsc
import warnings
warnings.filterwarnings("ignore")

# Enable `pool_allocator`
import rmm
from rmm.allocators.cupy import rmm_cupy_allocator
rmm.reinitialize(
    managed_memory=True,
    pool_allocator=False,
)
cp.cuda.set_allocator(rmm_cupy_allocator)

adata = sc.read('GC_all.h5ad')

rsc.get.anndata_to_GPU(adata)

rsc.pp.flag_gene_family(adata, gene_family_name="mt", gene_family_prefix="MT-")
rsc.pp.flag_gene_family(adata, gene_family_name="rpl", gene_family_prefix="RPL")
rsc.pp.flag_gene_family(adata, gene_family_name="rps", gene_family_prefix="RPS")

rsc.pp.calculate_qc_metrics(adata, qc_vars=['mt','rpl','rps'], log1p=False)
sc.pl.violin(adata, keys=['n_genes_by_counts', 'total_counts', 'pct_counts_mt','pct_counts_rpl','pct_counts_rps'], jitter=0.4, multi_panel=True)

sc.pl.scatter(adata, x='total_counts', y='pct_counts_mt')
sc.pl.scatter(adata, x='total_counts', y='pct_counts_rpl')
sc.pl.scatter(adata, x='total_counts', y='pct_counts_rps')
sc.pl.scatter(adata, x='total_counts', y='n_genes_by_counts')

adata = adata[adata.obs.n_genes_by_counts < 8000, :]
adata = adata[adata.obs.pct_counts_mt < 50, :]
adata = adata[adata.obs.pct_counts_rpl < 50, :]
adata = adata[adata.obs.pct_counts_rps < 50, :]

rsc.pp.filter_cells(adata, qc_var='n_genes_by_counts', min_count=100)
rsc.pp.filter_genes(adata, qc_var='n_cells_by_counts', min_count=25)

adata.layers["counts"] = adata.X.copy()

rsc.pp.normalize_total(adata, target_sum=1e4)
rsc.pp.log1p(adata)

rsc.pp.highly_variable_genes(adata, n_top_genes=5000)
sc.pl.highly_variable_genes(adata)
print(sum(adata.var.highly_variable))

adata.raw=adata

rsc.pp.regress_out(adata, keys=['total_counts', 'pct_counts_mt','pct_counts_rpl','pct_counts_rps'])
rsc.pp.scale(adata, max_value=10)
adata
AnnData object with n_obs × n_vars = 934583 × 5000
    obs: 'batch', 'type', 'more_type', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_rpl', 'pct_counts_rpl', 'total_counts_rps', 'pct_counts_rps'
    var: 'mt', 'rpl', 'rps', 'n_cells_by_counts', 'total_counts', 'mean_counts', 'pct_dropout_by_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
    uns: 'log1p', 'hvg'
    layers: 'counts'

rsc.pp.pca(adata, n_comps=50)
Then, the error comes out.

adata is a sparse matrix (AnnData object) with 934583 cells and 5000 genes included.
Thank you!

@lowener
Copy link
Contributor

lowener commented Dec 17, 2024

This seems similar to the error seen on issue #5555. However it should have been solved with cuml 24.10. If it ends up being that error, it should be solved with cuda toolkit 12.4.1.003, so a more recent CUDA version like 12.5 and higher could be a solution.

@hyjforesight
Copy link
Author

hyjforesight commented Dec 17, 2024

Hello @lowener
Thank you for your information.
My cuml is 24.10. I update CUDA to 12.6. However, the error persists.

rsc.pp.pca(adata, n_comps=50)
sc.pl.pca_variance_ratio(adata, log=True, n_pcs=100)
RuntimeError                              Traceback (most recent call last)
Cell In[23], line 1
----> 1 rsc.pp.pca(adata, n_comps=50)
      2 sc.pl.pca_variance_ratio(adata, log=True, n_pcs=100)
      3 adata

File /environment/miniconda3/lib/python3.11/site-packages/rapids_singlecell/preprocessing/_pca.py:174, in pca(***failed resolving arguments***)
    167     else:
    168         pca_func = PCA(
    169             n_components=n_comps,
    170             svd_solver=svd_solver,
    171             random_state=random_state,
    172             output_type="numpy",
    173         )
--> 174         X_pca = pca_func.fit_transform(X)
    176 elif not zero_center:
    177     pca_func = TruncatedSVD(
    178         n_components=n_comps,
    179         random_state=random_state,
    180         algorithm=svd_solver,
    181         output_type="numpy",
    182     )

File /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/api_decorators.py:188, in _make_decorator_function.<locals>.decorator_function.<locals>.decorator_closure.<locals>.wrapper(*args, **kwargs)
    185     set_api_output_dtype(output_dtype)
    187 if process_return:
--> 188     ret = func(*args, **kwargs)
    189 else:
    190     return func(*args, **kwargs)

File /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/api_decorators.py:393, in enable_device_interop.<locals>.dispatch(self, *args, **kwargs)
    391 if hasattr(self, "dispatch_func"):
    392     func_name = gpu_func.__name__
--> 393     return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
    394 else:
    395     return gpu_func(self, *args, **kwargs)

File /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/api_decorators.py:190, in _make_decorator_function.<locals>.decorator_function.<locals>.decorator_closure.<locals>.wrapper(*args, **kwargs)
    188         ret = func(*args, **kwargs)
    189     else:
--> 190         return func(*args, **kwargs)
    192 return cm.process_return(ret)

File base.pyx:687, in cuml.internals.base.UniversalBase.dispatch_func()

File pca.pyx:510, in cuml.decomposition.pca.PCA.fit_transform()

File /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/api_decorators.py:188, in _make_decorator_function.<locals>.decorator_function.<locals>.decorator_closure.<locals>.wrapper(*args, **kwargs)
    185     set_api_output_dtype(output_dtype)
    187 if process_return:
--> 188     ret = func(*args, **kwargs)
    189 else:
    190     return func(*args, **kwargs)

File /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/api_decorators.py:393, in enable_device_interop.<locals>.dispatch(self, *args, **kwargs)
    391 if hasattr(self, "dispatch_func"):
    392     func_name = gpu_func.__name__
--> 393     return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
    394 else:
    395     return gpu_func(self, *args, **kwargs)

File /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/api_decorators.py:190, in _make_decorator_function.<locals>.decorator_function.<locals>.decorator_closure.<locals>.wrapper(*args, **kwargs)
    188         ret = func(*args, **kwargs)
    189     else:
--> 190         return func(*args, **kwargs)
    192 return cm.process_return(ret)

File base.pyx:687, in cuml.internals.base.UniversalBase.dispatch_func()

File pca.pyx:481, in cuml.decomposition.pca.PCA.fit()

RuntimeError: cuSOLVER error encountered at: file=/__w/cuml/cuml/python/cuml/build/cp311-cp311-linux_x86_64/_deps/raft-src/cpp/include/raft/linalg/detail/eig.cuh line=136: call='cusolverDnxsyevd(cusolverH, dn_params, CUSOLVER_EIG_MODE_VECTOR, CUBLAS_FILL_MODE_UPPER, static_cast<int64_t>(n_rows), eig_vectors, static_cast<int64_t>(n_cols), eig_vals, d_work.data(), workspaceDevice, h_work.data(), workspaceHost, d_dev_info.data(), stream_new)', Reason=7:CUSOLVER_STATUS_INTERNAL_ERROR
Obtained 63 stack frames
#1 in /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/../libcuml++.so: raft::cusolver_error::cusolver_error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) +0x5a [0x7f2f513c258a]
#2 in /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/../libcuml++.so: void raft::linalg::detail::eigDC<double>(raft::resources const&, double const*, unsigned long, unsigned long, double*, double*, CUstream_st*) +0x1259 [0x7f2f51b373b9]
#3 in /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/../libcuml++.so: void ML::truncCompExpVars<double, ML::solver>(raft::handle_t const&, double*, double*, double*, double*, ML::paramsTSVDTemplate<ML::solver> const&, CUstream_st*) +0x739 [0x7f2f51f3e529]
#4 in /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/../libcuml++.so(+0x11c6a7e) [0x7f2f51f2da7e]
#5 in /environment/miniconda3/lib/python3.11/site-packages/cuml/decomposition/pca.cpython-311-x86_64-linux-gnu.so(+0x430fc) [0x7f2f468b40fc]
#6 in /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/base.cpython-311-x86_64-linux-gnu.so(+0x1009e) [0x7f2f4786b09e]
#7 in /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/base.cpython-311-x86_64-linux-gnu.so(+0x1c396) [0x7f2f47877396]
#8 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x4869 [0x515419]
#9 in /environment/miniconda3/bin/python() [0x557098]
#10 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x4869 [0x515419]
#11 in /environment/miniconda3/bin/python: _PyFunction_Vectorcall +0x173 [0x538903]
#12 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x4869 [0x515419]
#13 in /environment/miniconda3/bin/python: _PyFunction_Vectorcall +0x173 [0x538903]
#14 in /environment/miniconda3/lib/python3.11/site-packages/cuml/decomposition/pca.cpython-311-x86_64-linux-gnu.so(+0x40925) [0x7f2f468b1925]
#15 in /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/base.cpython-311-x86_64-linux-gnu.so(+0x1009e) [0x7f2f4786b09e]
#16 in /environment/miniconda3/lib/python3.11/site-packages/cuml/internals/base.cpython-311-x86_64-linux-gnu.so(+0x1c396) [0x7f2f47877396]
#17 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x4869 [0x515419]
#18 in /environment/miniconda3/bin/python() [0x557098]
#19 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x4869 [0x515419]
#20 in /environment/miniconda3/bin/python: _PyFunction_Vectorcall +0x173 [0x538903]
#21 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x4869 [0x515419]
#22 in /environment/miniconda3/bin/python() [0x5cb78a]
#23 in /environment/miniconda3/bin/python: PyEval_EvalCode +0x9f [0x5cae5f]
#24 in /environment/miniconda3/bin/python() [0x5e45e3]
#25 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x3738 [0x5142e8]
#26 in /environment/miniconda3/bin/python() [0x5e001a]
#27 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x321f [0x513dcf]
#28 in /environment/miniconda3/bin/python() [0x5e001a]
#29 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x321f [0x513dcf]
#30 in /environment/miniconda3/bin/python() [0x5e001a]
#31 in /environment/miniconda3/bin/python() [0x5e2656]
#32 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x38ba [0x51446a]
#33 in /environment/miniconda3/bin/python() [0x55799f]
#34 in /environment/miniconda3/bin/python() [0x55718e]
#35 in /environment/miniconda3/bin/python: PyObject_Call +0x12c [0x54288c]
#36 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x4869 [0x515419]
#37 in /environment/miniconda3/bin/python() [0x5e001a]
#38 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x321f [0x513dcf]
#39 in /environment/miniconda3/bin/python() [0x5e001a]
#40 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x321f [0x513dcf]
#41 in /environment/miniconda3/bin/python() [0x5e001a]
#42 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x321f [0x513dcf]
#43 in /environment/miniconda3/bin/python() [0x5e001a]
#44 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x321f [0x513dcf]
#45 in /environment/miniconda3/bin/python() [0x5e001a]
#46 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x321f [0x513dcf]
#47 in /environment/miniconda3/bin/python() [0x5e001a]
#48 in /environment/miniconda3/lib/python3.11/lib-dynload/_asyncio.cpython-311-x86_64-linux-gnu.so(+0x79fb) [0x7f32558ed9fb]
#49 in /environment/miniconda3/bin/python() [0x52657b]
#50 in /environment/miniconda3/bin/python() [0x4c6caf]
#51 in /environment/miniconda3/bin/python() [0x4cbd10]
#52 in /environment/miniconda3/bin/python() [0x51e3d7]
#53 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x928f [0x519e3f]
#54 in /environment/miniconda3/bin/python() [0x5cb78a]
#55 in /environment/miniconda3/bin/python: PyEval_EvalCode +0x9f [0x5cae5f]
#56 in /environment/miniconda3/bin/python() [0x5e45e3]
#57 in /environment/miniconda3/bin/python() [0x51e3d7]
#58 in /environment/miniconda3/bin/python: PyObject_Vectorcall +0x31 [0x51e2c1]
#59 in /environment/miniconda3/bin/python: _PyEval_EvalFrameDefault +0x6a6 [0x511256]
#60 in /environment/miniconda3/bin/python: _PyFunction_Vectorcall +0x173 [0x538903]
#61 in /environment/miniconda3/bin/python() [0x5f6c2f]
#62 in /environment/miniconda3/bin/python: Py_RunMain +0x14a [0x5f663a]
#63 in /environment/miniconda3/bin/python: Py_BytesMain +0x39 [0x5bb5c9]

CUDA is 12.6

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Oct_29_23:50:19_PDT_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0

nvidia-smi information

Tue Dec 17 16:10:09 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:10:00.0 Off |                  Off |
| 30%   32C    P8             22W /  450W |     562MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        Off |   00000000:12:00.0 Off |                  Off |
| 30%   33C    P8             30W /  450W |       4MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA GeForce RTX 4090        Off |   00000000:14:00.0 Off |                  Off |
| 30%   31C    P8             11W /  450W |       4MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA GeForce RTX 4090        Off |   00000000:16:00.0 Off |                  Off |
| 30%   31C    P8             25W /  450W |       4MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA GeForce RTX 4090        Off |   00000000:18:00.0 Off |                  Off |
| 30%   32C    P8             32W /  450W |       4MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA GeForce RTX 4090        Off |   00000000:1A:00.0 Off |                  Off |
| 30%   33C    P8             13W /  450W |       4MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      5253      C   /environment/miniconda3/bin/python            478MiB |
+-----------------------------------------------------------------------------------------+

pip list

Package                      Version
---------------------------- --------------
absl-py                      2.1.0
aiohttp                      3.7.4
anaconda-anon-usage          0.4.4
anndata                      0.11.1
anyio                        3.7.1
archspec                     0.2.3
argon2-cffi                  23.1.0
argon2-cffi-bindings         21.2.0
array_api_compat             1.9.1
arrow                        1.3.0
asttokens                    2.4.1
astunparse                   1.6.3
async-lru                    2.0.4
async-timeout                3.0.1
attrs                        23.2.0
Babel                        2.14.0
beautifulsoup4               4.12.3
bleach                       6.1.0
boltons                      23.0.0
Brotli                       1.0.9
cachetools                   5.5.0
certifi                      2024.2.2
cffi                         1.16.0
chardet                      3.0.4
charset-normalizer           2.0.4
click                        8.1.7
cloudpickle                  3.1.0
comm                         0.2.2
conda                        24.3.0
conda-content-trust          0.2.0
conda-libmamba-solver        24.1.0
conda-package-handling       2.2.0
conda_package_streaming      0.9.0
contourpy                    1.2.1
cryptography                 42.0.5
cuda-python                  12.6.2.post1
cudf-cu12                    24.10.1
cugraph-cu12                 24.10.0
cuml-cu12                    24.10.0
cupy-cuda12x                 13.3.0
cuvs-cu12                    24.10.0
cycler                       0.12.1
dask                         2024.9.0
dask-cuda                    24.10.0
dask-cudf-cu12               24.10.1
dask-expr                    1.1.14
debugpy                      1.8.1
decorator                    5.1.1
defusedxml                   0.7.1
distributed                  2024.9.0
distributed-ucxx-cu12        0.40.0
distro                       1.8.0
ecdsa                        0.19.0
executing                    2.0.1
fastjsonschema               2.19.1
fastrlock                    0.8.3
filelock                     3.13.4
flatbuffers                  24.3.25
fonttools                    4.53.0
fqdn                         1.5.1
fsspec                       2024.3.1
gast                         0.5.4
google-pasta                 0.2.0
grpcio                       1.62.2
h11                          0.14.0
h5py                         3.11.0
httpcore                     1.0.5
httpx                        0.27.0
idna                         3.4
igraph                       0.11.8
imageio                      2.36.1
importlib_metadata           8.5.0
ipykernel                    6.29.4
ipython                      8.23.0
ipython-genutils             0.2.0
isoduration                  20.11.0
jedi                         0.19.1
Jinja2                       3.1.3
joblib                       1.4.2
json5                        0.9.25
jsonpatch                    1.33
jsonpointer                  2.1
jsonschema                   4.21.1
jsonschema-specifications    2023.12.1
jupyter_client               8.6.1
jupyter_core                 5.7.2
jupyter-events               0.10.0
jupyter-lsp                  2.2.5
jupyter_server               2.14.0
jupyter_server_terminals     0.5.3
jupyterlab                   4.2.0
jupyterlab_pygments          0.3.0
jupyterlab_server            2.27.1
keras                        3.3.2
kiwisolver                   1.4.5
lazy_loader                  0.4
legacy-api-wrap              1.4.1
leidenalg                    0.10.2
libclang                     18.1.1
libcudf-cu12                 24.10.1
libmambapy                   1.5.8
libucx-cu12                  1.17.0.post1
libucxx-cu12                 0.40.0
llvmlite                     0.43.0
locket                       1.0.0
Markdown                     3.6
markdown-it-py               3.0.0
MarkupSafe                   2.1.5
matplotlib                   3.9.0
matplotlib-inline            0.1.7
mdurl                        0.1.2
menuinst                     2.0.2
mistune                      3.0.2
ml-dtypes                    0.3.2
mpmath                       1.3.0
msgpack                      1.1.0
multidict                    6.0.5
namex                        0.0.8
natsort                      8.4.0
nbclassic                    0.2.8
nbclient                     0.10.0
nbconvert                    7.16.3
nbformat                     5.10.4
nest-asyncio                 1.6.0
networkx                     3.3
notebook                     6.4.13
notebook_shim                0.2.4
numba                        0.60.0
numpy                        1.26.4
nvidia-cublas-cu12           12.1.3.1
nvidia-cuda-cupti-cu12       12.1.105
nvidia-cuda-nvrtc-cu12       12.1.105
nvidia-cuda-runtime-cu12     12.1.105
nvidia-cudnn-cu12            8.9.2.26
nvidia-cufft-cu12            11.0.2.54
nvidia-curand-cu12           10.3.2.106
nvidia-cusolver-cu12         11.4.5.107
nvidia-cusparse-cu12         12.1.0.106
nvidia-nccl-cu12             2.19.3
nvidia-nvjitlink-cu12        12.4.127
nvidia-nvtx-cu12             12.1.105
nvtx                         0.2.10
opt-einsum                   3.3.0
optree                       0.11.0
overrides                    7.7.0
packaging                    23.2
pandas                       2.2.2
pandocfilters                1.5.1
parso                        0.8.4
partd                        1.4.2
patsy                        1.0.1
pexpect                      4.9.0
pillow                       10.3.0
pip                          23.3.1
platformdirs                 3.10.0
pluggy                       1.0.0
prometheus_client            0.20.0
prompt-toolkit               3.0.43
protobuf                     4.25.3
psutil                       5.9.8
ptyprocess                   0.7.0
pure-eval                    0.2.2
pyarrow                      17.0.0
pycosat                      0.6.6
pycparser                    2.21
Pygments                     2.17.2
pylibcudf-cu12               24.10.1
pylibcugraph-cu12            24.10.0
pylibraft-cu12               24.10.0
pynndescent                  0.5.13
pynvjitlink-cu12             0.4.0
pynvml                       11.4.1
pyparsing                    3.1.2
PySocks                      1.7.1
python-dateutil              2.9.0.post0
python-json-logger           2.0.7
pytz                         2024.1
PyYAML                       6.0.1
pyzmq                        26.0.2
raft-dask-cu12               24.10.0
rapids-dask-dependency       24.10.0
rapids_singlecell            0.10.11
referencing                  0.35.0
requests                     2.31.0
rfc3339-validator            0.1.4
rfc3986-validator            0.1.1
rich                         13.7.1
rmm-cu12                     24.10.0
rpds-py                      0.18.0
ruamel.yaml                  0.17.21
scanpy                       1.10.4
scikit-image                 0.25.0
scikit-learn                 1.6.0
scikit-misc                  0.5.1
scipy                        1.14.1
seaborn                      0.13.2
Send2Trash                   1.8.3
session-info                 1.0.0
setuptools                   68.2.2
six                          1.16.0
sniffio                      1.3.1
sortedcontainers             2.4.0
soupsieve                    2.5
sshpubkeys                   3.3.1
stack-data                   0.6.3
statsmodels                  0.14.4
stdlib-list                  0.11.0
sympy                        1.12
tblib                        3.0.0
tensorboard                  2.16.2
tensorboard-data-server      0.7.2
tensorflow                   2.16.1
tensorflow-io-gcs-filesystem 0.36.0
termcolor                    2.4.0
terminado                    0.18.1
texttable                    1.7.0
threadpoolctl                3.5.0
tifffile                     2024.12.12
tinycss2                     1.3.0
toolz                        1.0.0
torch                        2.2.2
torchaudio                   2.2.2
torchvision                  0.17.2
tornado                      6.4
tqdm                         4.65.0
traitlets                    5.14.3
treelite                     4.3.0
triton                       2.2.0
truststore                   0.8.0
types-python-dateutil        2.9.0.20240316
typing_extensions            4.11.0
tzdata                       2024.1
ucx-py-cu12                  0.40.0
ucxx-cu12                    0.40.0
umap-learn                   0.5.7
uri-template                 1.3.0
urllib3                      2.1.0
wcwidth                      0.2.13
webcolors                    1.13
webencodings                 0.5.1
websocket-client             1.8.0
Werkzeug                     3.0.2
wheel                        0.41.2
wrapt                        1.16.0
yarl                         1.9.4
zict                         3.0.0
zipp                         3.21.0
zstandard                    0.19.0

BTW

if isinstance(adata.X, cp.ndarray):
    print("Checking for NaN or Inf values in adata.X...")
    print(cp.any(cp.isnan(adata.X)))
    print(cp.any(cp.isinf(adata.X)))
else:
    print("Checking for NaN or Inf values in adata.X...")
    print(np.any(np.isnan(adata.X.toarray())))
    print(np.any(np.isinf(adata.X.toarray())))
Checking for NaN or Inf values in adata.X...
False
False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants