-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]Agglomerative clustering encounter cudaErrorInvalidValue:invalid argument #4424
Comments
Thank you for opening this issue. There have been a few releases now since cuml 0.19 which have fixed several bugs in the agglomerative clustering code. I'm not able to reproduce this on the most recent version (22.02 at the time of writing), you able to try a more recent version? |
@cjnolet Thank you for the quick response. I am aware of a few more releases after 0.19. But 0.19 has been explicitly specified in https://github.com/rapidsai/cloud-ml-examples/blob/main/databricks/docker/rapids-spec.txt . Could you pinpoint places where needs to be replaced in order to use the latest version? I have to create a customized image to run on Databricks. Thanks! |
This issue has been labeled |
This issue has been labeled |
Encountered the same error with the latest cuML version 23.8.0(using tesla p100 16GB). In here, it is said that, the problem occurs because of old hardware. Is this the case actually? Is there any progress for fixing it? |
Describe the bug
Run into cudaError
Environment details (please complete the following information):
sample code to reproduce error
import cudf
import cupy
from cuml.cluster import AgglomerativeClustering
from cuml.datasets import make_blobs
n_samples = 10000
n_features = 2
n_clusters = 10
random_state = 0
generate data
device_data, device_labels = make_blobs(n_samples=n_samples,
n_features=n_features,
centers=n_clusters,
random_state=random_state,
cluster_std=0.1)
device_data = cudf.DataFrame(device_data)
device_labels = cudf.Series(device_labels)
agglomerative hierarchical clustering
hc_cuml = AgglomerativeClustering(n_clusters=n_clusters, affinity="euclidean", linkage="single",connectivity='knn',n_neighbors=10)
hc_cuml.fit(device_data)
error message
RuntimeError Traceback (most recent call last)
in
22 # agglomerative hierarchical clustering
23 hc_cuml = AgglomerativeClustering(n_clusters=n_clusters, affinity="euclidean", linkage="single",connectivity='knn',n_neighbors=10)
---> 24 hc_cuml.fit(device_data)
/databricks/python/lib/python3.8/site-packages/cuml/internals/api_decorators.py in inner_with_setters(*args, **kwargs)
407 target_val=target_val)
408
--> 409 return func(*args, **kwargs)
410
411 @wraps(func)
cuml/cluster/agglomerative.pyx in cuml.cluster.agglomerative.AgglomerativeClustering.fit()
RuntimeError: CUDA error encountered at: file=raft/src/raft/cpp/include/raft/cudart_utils.h line=205: call='cudaMemcpyAsync(dst, src, len * sizeof(Type), cudaMemcpyDefault, stream)', Reason=cudaErrorInvalidValue:invalid argument
Obtained 49 stack frames
#0 in /databricks/python/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN4raft9exception18collect_call_stackEv+0x46) [0x7f7a61c26af6]
#1 in /databricks/python/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN4raft10cuda_errorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x69) [0x7f7a61c27259]
#2 in /databricks/python/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN4raft4copyIiEEvPT_PKS1_mP11CUstream_st+0x138) [0x7f7a61c3e088]
#3 in /databricks/python/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN4raft9hierarchy6detail21build_dendrogram_hostIifEEvRKNS_8handle_tEPKT_S8_PKT0_mPS6_RN3rmm14device_uvectorIS9_EERNSE_IS6_EE+0x4cb) [0x7f7a61f09e8b]
#4 in /databricks/python/lib/python3.8/site-packages/cuml/common/../../../../libcuml++.so(_ZN4raft9hierarchy14single_linkageIifLNS0_15LinkageDistanceE1EEEvRKNS_8handle_tEPKT0_mmNS_8distance12DistanceTypeEPNS0_14linkage_outputIT_S6_EEim+0x6c4) [0x7f7a61efd114]
#5 in /databricks/python/lib/python3.8/site-packages/cuml/cluster/agglomerative.cpython-38-x86_64-linux-gnu.so(+0x29bdc) [0x7f7a4ded9bdc]
#6 in /databricks/python/bin/python(PyObject_Call+0x255) [0x55e8204062b5]
The text was updated successfully, but these errors were encountered: