Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add Cosine Distance metric to DBSCAN #4210

Closed
dnaveenr opened this issue Sep 17, 2021 · 2 comments · Fixed by #4776
Closed

[FEA] Add Cosine Distance metric to DBSCAN #4210

dnaveenr opened this issue Sep 17, 2021 · 2 comments · Fixed by #4776

Comments

@dnaveenr
Copy link

Cosine distance metric is not part of the DBSCAN metric.
https://docs.rapids.ai/api/cuml/stable/api.html#dbscan

Pairwise distances does support the cosine metric :
https://docs.rapids.ai/api/cuml/stable/api.html?highlight=pairwise%20distances#module-cuml.metrics.pairwise_distances

I think it should be quite easy to add cosine distance since DistanceType already has CosineExpanded "raft::distance::DistanceType::CosineExpanded".

  1. https://github.com/rapidsai/cuml/blob/366e71f/python/cuml/cluster/dbscan.pyx#L264
  2. https://github.com/rapidsai/cuml/blob/366e71fe2bde359643567f2756049fea4fbd2b08/python/cuml/metrics/distance_type.pxd

I tried making these changes locally but I'm facing issues when I build from source and wasn't successful.

I tried 'precomputed' method of DBSCAN by passing a square cosine distance metric, but this leads to out of memory issues once the size of the vectors increase since a N*N matrix is required.

@dnaveenr dnaveenr added ? - Needs Triage Need team to review and classify feature request New feature or request labels Sep 17, 2021
@viclafargue viclafargue removed the ? - Needs Triage Need team to review and classify label Sep 20, 2021
@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@rapids-bot rapids-bot bot closed this as completed in #4776 Jul 7, 2022
rapids-bot bot pushed a commit that referenced this issue Jul 7, 2022
closes #4210 
Added cosine distance metric for computing epsilon neighborhood in DBSCAN. The cosine distance computed as L2 norm of L2 normalized vectors and the epsilon value is adjusted accordingly.

Authors:
  - Tarang Jain (https://github.com/tarang-jain)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #4776
jakirkham pushed a commit to jakirkham/cuml that referenced this issue Feb 27, 2023
closes rapidsai#4210 
Added cosine distance metric for computing epsilon neighborhood in DBSCAN. The cosine distance computed as L2 norm of L2 normalized vectors and the epsilon value is adjusted accordingly.

Authors:
  - Tarang Jain (https://github.com/tarang-jain)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#4776
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants