Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support for Dice coefficient as a metric in UMAP #5129

Open
beckernick opened this issue Jan 12, 2023 · 3 comments · May be fixed by #5140 or #5943
Open

[FEA] Support for Dice coefficient as a metric in UMAP #5129

beckernick opened this issue Jan 12, 2023 · 3 comments · May be fixed by #5140 or #5943
Assignees
Labels
Cython / Python Cython or Python issue feature request New feature or request good first issue Good for newcomers

Comments

@beckernick
Copy link
Member

beckernick commented Jan 12, 2023

I'd like to be able to use the Dice coefficient for UMAP like I can on the CPU. A small number of users choose the Dice metric based on this Github search.

But, as RAFT supports the Dice coefficient as a metric and cuML's DistanceType enum already supports DiceExpanded, this may be as simple as adding it a supported metric in the UMAP metric mapping dictionary.

import cuml
import umap

X, _ = cuml.datasets.make_blobs()

reducer = umap.UMAP(metric="dice")
print(reducer.fit_transform(X.get())[:5])
/home/nicholasb/miniconda3/envs/rapids-23.02/lib/python3.8/site-packages/umap/umap_.py:1802: UserWarning: gradient function is not yet implemented for dice distance metric; inverse_transform will be unavailable
  warn(
[[-1.687048   15.136897  ]
 [-1.1637341  14.760551  ]
 [ 1.4106333  14.12161   ]
 [-0.07474275 13.303681  ]
 [ 1.5589024  16.217863  ]]

reducer = cuml.manifold.umap.UMAP(metric="dice")
print(reducer.fit_transform(X)[:5])

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [14], in <cell line: 10>()
      7 print(reducer.fit_transform(X.get())[:5])
      9 reducer = cuml.manifold.umap.UMAP(metric="dice")
---> 10 print(reducer.fit_transform(X)[:5])

File ~/miniconda3/envs/rapids-23.02/lib/python3.8/site-packages/cuml/internals/api_decorators.py:548, in BaseReturnArrayDecorator.__call__.<locals>.inner_set_get(*args, **kwargs)
    545         self.do_getters_with_self_no_input(self_val=self_val)
    547     # Call the function
--> 548     ret_val = func(*args, **kwargs)
    550 return cm.process_return(ret_val)

File ~/miniconda3/envs/rapids-23.02/lib/python3.8/site-packages/cuml/internals/api_decorators.py:817, in enable_device_interop.<locals>.dispatch(self, *args, **kwargs)
    815 if hasattr(self, 'dispatch_func'):
    816     func_name = gpu_func.__name__
--> 817     return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
    818 else:
    819     return gpu_func(self, *args, **kwargs)

File ~/miniconda3/envs/rapids-23.02/lib/python3.8/site-packages/cuml/internals/api_decorators.py:359, in ReturnAnyDecorator.__call__.<locals>.inner(*args, **kwargs)
    356 @wraps(func)
    357 def inner(*args, **kwargs):
    358     with self._recreate_cm(func, args):
--> 359         return func(*args, **kwargs)

File base.pyx:656, in cuml.internals.base.UniversalBase.dispatch_func()

File umap.pyx:659, in cuml.manifold.umap.UMAP.fit_transform()

File ~/miniconda3/envs/rapids-23.02/lib/python3.8/site-packages/cuml/internals/api_decorators.py:408, in BaseReturnAnyDecorator.__call__.<locals>.inner_with_setters(*args, **kwargs)
    401 self_val, input_val, target_val = \
    402     self.get_arg_values(*args, **kwargs)
    404 self.do_setters(self_val=self_val,
    405                 input_val=input_val,
    406                 target_val=target_val)
--> 408 return func(*args, **kwargs)

File ~/miniconda3/envs/rapids-23.02/lib/python3.8/site-packages/cuml/internals/api_decorators.py:817, in enable_device_interop.<locals>.dispatch(self, *args, **kwargs)
    815 if hasattr(self, 'dispatch_func'):
    816     func_name = gpu_func.__name__
--> 817     return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
    818 else:
    819     return gpu_func(self, *args, **kwargs)

File ~/miniconda3/envs/rapids-23.02/lib/python3.8/site-packages/cuml/internals/api_decorators.py:359, in ReturnAnyDecorator.__call__.<locals>.inner(*args, **kwargs)
    356 @wraps(func)
    357 def inner(*args, **kwargs):
    358     with self._recreate_cm(func, args):
--> 359         return func(*args, **kwargs)

File base.pyx:656, in cuml.internals.base.UniversalBase.dispatch_func()

File umap.pyx:569, in cuml.manifold.umap.UMAP.fit()

File umap.pyx:465, in cuml.manifold.umap.UMAP._build_umap_params()

ValueError: Invalid value for metric: dice

(Using the 23.02.00a230112 cuda11_py38_g69db20fd6_77 nightly conda package).

@beckernick beckernick added feature request New feature or request good first issue Good for newcomers Cython / Python Cython or Python issue labels Jan 12, 2023
@beckernick beckernick self-assigned this Jan 12, 2023
@beckernick beckernick linked a pull request Jan 19, 2023 that will close this issue
@cjnolet
Copy link
Member

cjnolet commented Jan 19, 2023

@beckernick the beauty of dice distance is that it can be computed w/ a dot product and norms. This means it's super easy to compute this on a dense matrix as well if that's also going to be needed.

@beckernick
Copy link
Member Author

This seems to work locally for me on dense matrices with the existing machinery (using #5140). As a note, for sparse matrices we need to explicitly coerce the input data to Boolean type for the dice metric calculation to not throw a memory error.

We can file an issue to follow up on that, if desired. This applies to the existing Jaccard metric, too. We should probably catch this error and fail gracefully

@cjnolet
Copy link
Member

cjnolet commented Jan 19, 2023

This applies to the existing Jaccard metric, too.

Jaccard doesn't look like it's supported in the dense brute-force API either. I wonder if the test is returning all zeros or something. I'm honestly surprised the C++ is not outright throwing an error.

rapids-bot bot pushed a commit to rapidsai/raft that referenced this issue Jun 24, 2024
Adds support for the `DistanceType::DiceExpanded` for dense inputs.
1. Naive Kernel Implementation (unexpanded form)
2. Expanded form for dice distance that follows ground truth from `scipy.spatial.distance.dice`
3. Gtests in `cpp/test/distance/dist-dice.cu` 

Related to rapidsai/cuml#5129

Authors:
  - Anupam (https://github.com/aamijar)

Approvers:
  - Divye Gala (https://github.com/divyegala)

URL: #2359
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cython / Python Cython or Python issue feature request New feature or request good first issue Good for newcomers
Projects
None yet
2 participants