[BUG]Agglomerative clustering encounter cudaErrorInvalidValue:invalid argument #4424

lzhang282 · 2021-12-04T21:56:57Z

Describe the bug
Run into cudaError

Environment details (please complete the following information):

Cloud: Databricks runtime 9.1LTS
Linux Distro/Architecture: [Ubuntu 18.04 amd64]
GPU Model/Driver: [V100 and driver 396.44]
CUDA: 11.0
CUML: 0.19

sample code to reproduce error

import cudf
import cupy
from cuml.cluster import AgglomerativeClustering
from cuml.datasets import make_blobs

n_samples = 10000
n_features = 2

n_clusters = 10
random_state = 0

generate data

device_data, device_labels = make_blobs(n_samples=n_samples,
n_features=n_features,
centers=n_clusters,
random_state=random_state,
cluster_std=0.1)

device_data = cudf.DataFrame(device_data)
device_labels = cudf.Series(device_labels)

agglomerative hierarchical clustering

hc_cuml = AgglomerativeClustering(n_clusters=n_clusters, affinity="euclidean", linkage="single",connectivity='knn',n_neighbors=10)
hc_cuml.fit(device_data)

error message

RuntimeError Traceback (most recent call last)
in
22 # agglomerative hierarchical clustering
23 hc_cuml = AgglomerativeClustering(n_clusters=n_clusters, affinity="euclidean", linkage="single",connectivity='knn',n_neighbors=10)
---> 24 hc_cuml.fit(device_data)

/databricks/python/lib/python3.8/site-packages/cuml/internals/api_decorators.py in inner_with_setters(*args, **kwargs)
407 target_val=target_val)
408
--> 409 return func(*args, **kwargs)
410
411 @wraps(func)

cuml/cluster/agglomerative.pyx in cuml.cluster.agglomerative.AgglomerativeClustering.fit()

RuntimeError: CUDA error encountered at: file=raft/src/raft/cpp/include/raft/cudart_utils.h line=205: call='cudaMemcpyAsync(dst, src, len * sizeof(Type), cudaMemcpyDefault, stream)', Reason=cudaErrorInvalidValue:invalid argument
Obtained 49 stack frames
#0 in /databricks/python/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN4raft9exception18collect_call_stackEv+0x46) [0x7f7a61c26af6]
#1 in /databricks/python/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN4raft10cuda_errorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x69) [0x7f7a61c27259]
#2 in /databricks/python/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN4raft4copyIiEEvPT_PKS1_mP11CUstream_st+0x138) [0x7f7a61c3e088]
#3 in /databricks/python/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN4raft9hierarchy6detail21build_dendrogram_hostIifEEvRKNS_8handle_tEPKT_S8_PKT0_mPS6_RN3rmm14device_uvectorIS9_EERNSE_IS6_EE+0x4cb) [0x7f7a61f09e8b]
#4 in /databricks/python/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN4raft9hierarchy14single_linkageIifLNS0_15LinkageDistanceE1EEEvRKNS_8handle_tEPKT0_mmNS_8distance12DistanceTypeEPNS0_14linkage_outputIT_S6_EEim+0x6c4) [0x7f7a61efd114]
#5 in /databricks/python/lib/python3.8/site-packages/cuml/cluster/agglomerative.cpython-38-x86_64-linux-gnu.so(+0x29bdc) [0x7f7a4ded9bdc]
#6 in /databricks/python/bin/python(PyObject_Call+0x255) [0x55e8204062b5]

cjnolet · 2021-12-06T14:10:51Z

@lzhang282,

Thank you for opening this issue. There have been a few releases now since cuml 0.19 which have fixed several bugs in the agglomerative clustering code. I'm not able to reproduce this on the most recent version (22.02 at the time of writing), you able to try a more recent version?

lzhang282 · 2021-12-06T18:35:21Z

@cjnolet Thank you for the quick response. I am aware of a few more releases after 0.19. But 0.19 has been explicitly specified in https://github.com/rapidsai/cloud-ml-examples/blob/main/databricks/docker/rapids-spec.txt . Could you pinpoint places where needs to be replaced in order to use the latest version? I have to create a customized image to run on Databricks. Thanks!

github-actions · 2022-01-05T19:02:40Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions · 2022-04-05T19:02:41Z

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

ilkersigirci · 2023-08-25T14:29:18Z

Encountered the same error with the latest cuML version 23.8.0(using tesla p100 16GB). In here, it is said that, the problem occurs because of old hardware. Is this the case actually? Is there any progress for fixing it?

lzhang282 added ? - Needs Triage Need team to review and classify bug Something isn't working labels Dec 4, 2021

github-actions bot added the inactive-30d label Jan 5, 2022

github-actions bot added the inactive-90d label Apr 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]Agglomerative clustering encounter cudaErrorInvalidValue:invalid argument #4424

[BUG]Agglomerative clustering encounter cudaErrorInvalidValue:invalid argument #4424

lzhang282 commented Dec 4, 2021

cjnolet commented Dec 6, 2021

lzhang282 commented Dec 6, 2021

github-actions bot commented Jan 5, 2022

github-actions bot commented Apr 5, 2022

ilkersigirci commented Aug 25, 2023 •

edited

Loading

[BUG]Agglomerative clustering encounter cudaErrorInvalidValue:invalid argument #4424

[BUG]Agglomerative clustering encounter cudaErrorInvalidValue:invalid argument #4424

Comments

lzhang282 commented Dec 4, 2021

sample code to reproduce error

generate data

agglomerative hierarchical clustering

error message

cjnolet commented Dec 6, 2021

lzhang282 commented Dec 6, 2021

github-actions bot commented Jan 5, 2022

github-actions bot commented Apr 5, 2022

ilkersigirci commented Aug 25, 2023 • edited Loading

ilkersigirci commented Aug 25, 2023 •

edited

Loading