Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The support for "precomputed" in DBSCAN [FEA] #3302

Closed
WangWenhao0716 opened this issue Dec 13, 2020 · 16 comments · Fixed by #3585
Closed

The support for "precomputed" in DBSCAN [FEA] #3302

WangWenhao0716 opened this issue Dec 13, 2020 · 16 comments · Fixed by #3585
Labels
Algorithm API Change For tracking changes to algorithms that might effect the API feature request New feature or request

Comments

@WangWenhao0716
Copy link

It can be observed from sklearn-dbscan that the metric can be “precomputed” and X can be a distance matrix. However, currently, cuML does not support this "precomputed" metric.
I want to know whether this feature will be added to a future version of cuML?
Thanks.

@WangWenhao0716 WangWenhao0716 added ? - Needs Triage Need team to review and classify feature request New feature or request labels Dec 13, 2020
@teju85
Copy link
Member

teju85 commented Dec 13, 2020

This should be possible. But would like to hear thoughts from @Nyrio who's currently working on dbscan.

@WangWenhao0716 I'm assuming you'd be interested in passing a dense distance matrix as input?

@WangWenhao0716
Copy link
Author

@teju85 Thanks a lot! The assumption is right!

@Jie2World
Copy link

Looking forward to good news.

@Nyrio
Copy link
Contributor

Nyrio commented Dec 14, 2020

I agree with @teju85. It is possible to support that in our implementation.

@WangWenhao0716
Copy link
Author

@Nyrio Thanks, is there a timetable to realize this feature?

@hcho3 hcho3 added Algorithm API Change For tracking changes to algorithms that might effect the API and removed ? - Needs Triage Need team to review and classify labels Dec 18, 2020
@WangWenhao0716
Copy link
Author

any update?

@teju85
Copy link
Member

teju85 commented Jan 12, 2021

@WangWenhao0716 @Jie2World sorry for the delayed response. We can get this implemented in 0.19 release timeframe. Does that sound ok to you?

@WangWenhao0716
Copy link
Author

WangWenhao0716 commented Jan 12, 2021 via email

@Jie2World
Copy link

Thanks. You made my day.

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@teju85
Copy link
Member

teju85 commented Feb 20, 2021

This is being targeted for 0.19 release and @Nyrio is working towards it. So, certainly not stale.

@WangWenhao0716
Copy link
Author

The 0.19 nightly version is online, and I want to know whether or when the "precomputed" choice will appear in this nightly version? Thanks a lot!

@teju85
Copy link
Member

teju85 commented Mar 1, 2021

@WangWenhao0716 the work has currently not started, but for sure we'll be filing PR during this month. We'll update this issue once this feature hits nightly builds.

@WangWenhao0716
Copy link
Author

@teju85 Thanks a lot for your contribution.

rapids-bot bot pushed a commit that referenced this issue Mar 30, 2021
Closes #3302

## Notes about performance

If we don't count the cost of pre-computing the distance matrix (which is done by the user), the single-GPU code runs slightly faster when the distance matrix is pre-computed. (note: this is 2d, greater speedups expected for larger dimensions!)

![dbscan_precomputed_sg](https://user-images.githubusercontent.com/17441062/110022273-01f81280-7d2c-11eb-859f-929b691d76d5.png)
![dbscan_precomputed_timeline](https://user-images.githubusercontent.com/17441062/110022288-058b9980-7d2c-11eb-968c-eca35898f659.png)

As I have stated in a comment in the code, it works with two kernels: one that uses a coalesced reduction to compute the vertex degrees from the distance matrix, and one that uses a 2D copy fused with an unary operation to get the boolean neighborhood matrix.

_**Note:** the performance of this step could be better if `adj` was a row-major B*N matrix instead of column-major. We could fuse everything into one efficient kernel. It is something to keep in mind when we re-write `csr_adj_graph_batched`._

## Notes about MNMG

Cf #3615

Authors:
  - Louis Sugy (@Nyrio)

Approvers:
  - Tamas Bela Feher (@tfeher)
  - Thejaswi. N. S (@teju85)
  - Dante Gama Dessavre (@dantegd)

URL: #3585
@teju85
Copy link
Member

teju85 commented Mar 30, 2021

@WangWenhao0716 and @Jie2World this feature is now in 0.19 branch (Thanks @Nyrio). Can you folks try this out and give us feedback?

@WangWenhao0716
Copy link
Author

WangWenhao0716 commented Mar 30, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algorithm API Change For tracking changes to algorithms that might effect the API feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants