-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] "Precomputed" Distance Matrix in (some) Clustering Algorithms #4516
Comments
This issue has been labeled |
This issue has been labeled |
Hi, Are there any updates about making precomputed matrixes available for HDBSCAN? |
Just attempted to perform HDBSCAN on a hdb.fit(D)
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
ret = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/api_decorators.py", line 393, in dispatch
return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "base.pyx", line 687, in cuml.internals.base.UniversalBase.dispatch_func
File "hdbscan.pyx", line 762, in cuml.cluster.hdbscan.hdbscan.HDBSCAN.fit
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/nvtx/nvtx.py", line 116, in inner
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/input_utils.py", line 380, in input_to_cuml_array
arr = CumlArray.from_input(
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/memory_utils.py", line 87, in cupy_rmm_wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/nvtx/nvtx.py", line 116, in inner
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/array.py", line 1114, in from_input
arr = cls(X, index=index, order=requested_order, validate=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/memory_utils.py", line 87, in cupy_rmm_wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/nvtx/nvtx.py", line 116, in inner
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/array.py", line 292, in __init__
new_data = cur_xpy.asarray(data, dtype=dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cupy/_creation/from_data.py", line 88, in asarray
return _core.array(a, dtype, False, order, blocking=blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "cupy/_core/core.pyx", line 2379, in cupy._core.core.array
File "cupy/_core/core.pyx", line 2406, in cupy._core.core.array
File "cupy/_core/core.pyx", line 2541, in cupy._core.core._array_default
ValueError: setting an array element with a sequence.
python-BaseException As I notice there's a hdb.fit(D)
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/api_decorators.py", line 188, in wrapper
ret = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/api_decorators.py", line 393, in dispatch
return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/api_decorators.py", line 190, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "base.pyx", line 687, in cuml.internals.base.UniversalBase.dispatch_func
File "hdbscan.pyx", line 762, in cuml.cluster.hdbscan.hdbscan.HDBSCAN.fit
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/nvtx/nvtx.py", line 116, in inner
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/input_utils.py", line 380, in input_to_cuml_array
arr = CumlArray.from_input(
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/memory_utils.py", line 87, in cupy_rmm_wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/nvtx/nvtx.py", line 116, in inner
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/array.py", line 1114, in from_input
arr = cls(X, index=index, order=requested_order, validate=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/memory_utils.py", line 87, in cupy_rmm_wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/nvtx/nvtx.py", line 116, in inner
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cuml/internals/array.py", line 292, in __init__
new_data = cur_xpy.asarray(data, dtype=dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/U_2021PZZZJC0001/jiaxin.guo/miniforge3/envs/py311/lib/python3.11/site-packages/cupy/_creation/from_data.py", line 88, in asarray
return _core.array(a, dtype, False, order, blocking=blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "cupy/_core/core.pyx", line 2379, in cupy._core.core.array
File "cupy/_core/core.pyx", line 2406, in cupy._core.core.array
File "cupy/_core/core.pyx", line 2541, in cupy._core.core._array_default
TypeError: float() argument must be a string or a real number, not 'SparseCumlArray'
python-BaseException Looking forward to any suggestion or support schedule for this, as precomputed, sparse distance matrices are common in clustering algorithms. |
@cjnolet , what would it take to get this made and merged in? I'm happy to take a shot at it, no promises as to how far I get. But I'm working with some data right now that would very much benefit from 'cosine', and failing that, 'precompute' is a good option to get a lot of different metrics working. I would just need some guidance on where to start. |
Sometimes we do not have point representations in space but rather only distances between those points.
Therefore it would be great if some algorithms (I'm especially interested in HDBSCAN and Agglomerative Clustering) are able to work on precomputed (sparse) distance matrices, similar to using "precomputed" metric in a lot of sklearn algorithms.
Personally, I'm working with biological, structural data, hence I only have differences in structure but not points in space.
There are several issues that also relate to this FEA - #4475 #4460 (#1192, #4409), and the implementation for e.g. DBSCAN already happened with issue #3302.
The text was updated successfully, but these errors were encountered: