[FEA] Need a approximate_predict function for cuml HDBSCAN #4448

sudhanshu-shukla-git · 2021-12-14T14:01:17Z

Is your feature request related to a problem? Please describe.
I wish I could use cuML HDBSCAN to do predicting the clusters from the existing model, similar to the scikit-learn's approximate_predict

Describe the solution you'd like
Similar to scikit-learn HDBSCAN's approximate_predict
https://hdbscan.readthedocs.io/en/latest/api.html#hdbscan.prediction.approximate_predict

Predict the cluster label of new points. The returned labels will be those of the original clustering found by clusterer, and therefore are not (necessarily) the cluster labels that would be found by clustering the original data combined with points_to_predict, hence the ‘approximate’ label.

Describe alternatives you've considered

There is a CPU based solution available already by Scikit, but need a GPU based solution.

https://hdbscan.readthedocs.io/en/latest/api.html#hdbscan.prediction.approximate_predict

github-actions · 2022-01-13T14:07:47Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

sudhanshu-shukla-git · 2022-02-28T09:43:50Z

@cjnolet Do we have any updates on this feature? When can we expect this to be released?

github-actions · 2022-03-30T10:07:49Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

whymauri · 2022-05-02T23:22:27Z

(commenting to maintain the issue as active)

github-actions · 2022-06-02T00:11:42Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

sudhanshu-shukla-git · 2022-06-02T07:06:07Z

(commenting to maintain the issue as active)

cedivad · 2022-06-17T16:03:39Z

I was also looking for this feature. I assume the models aren't binary-compatible and we can't use a model created by cuml for say scikit-learn's approximate_predict?

whymauri · 2022-06-17T19:05:36Z

Technically you can extract the required datastructures from RAPIDS and inject them into SKLearn's HierarchicalLabelTree.

But you will have to do a lot of implementation on your own end.

osalem-l · 2022-06-20T11:53:37Z

Technically you can extract the required datastructures from RAPIDS and inject them into SKLearn's HierarchicalLabelTree.

But you will have to do a lot of implementation on your own end.

Could you illustrate how? currently I'm trying to figure this out

cedivad · 2022-06-20T12:19:23Z

I've looked at SKLearn's implementation and it seems they are using a brute force approach, calculating distances to each centroid one by one. On a GPU, I'm thinking yes, you could parallelise the distance calculations but you would still need to check the results one by one. Best case you would spawn a "binary tree" of checking threads. I believe this is a task that isn't very parallelizable, and maybe that's why it was de-prioritized?

If so we only need to extract the centroids from RAPIDS and use them in whatever code we want, say a small go http server for inference of new vectors.

RaiAmanRai · 2022-08-01T07:16:26Z

Hi @cjnolet @divyegala any updates on this feature, or any appromimate timeline when this will roll out.

Would really appreciate the work.

ldsands · 2022-08-03T16:35:59Z

Hi @cjnolet @divyegala any updates on this feature, or any appromimate timeline when this will roll out.

Would really appreciate the work.

Does this pull request not add this feature? I haven't dived in deep to see but just glancing it looks like it does. At the very least, this pull request is needed before the approximate_predict feature can be implemented.

cjnolet · 2022-08-03T16:48:37Z

@sudhanshu-shukla-git @RaiAmanRai

Does #4800 not add this feature? I haven't dived in deep to see but just glancing it looks like it does. At the very least, this pull request is needed before the approximate_predict feature can be implemented.

That pull request implements the needed pieces for fuzzy clustering, which is a stepping stone towards out of sample prediction (approximate_predict). We're working towards the approximate predict.

DeepTitan · 2022-08-22T19:00:09Z

I second the need for this feature, would really help in my project

PR for HDBSCAN approximate_predict - [x] Building cluster_map - [x] Modifying PredictionData class - [x] Obtaining nearest neighbor in MR space - [x] Computing probability - [x] Tests Closes #4877 Closes #4448 Authors: - Tarang Jain (https://github.com/tarang-jain) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #4872

PR for HDBSCAN approximate_predict - [x] Building cluster_map - [x] Modifying PredictionData class - [x] Obtaining nearest neighbor in MR space - [x] Computing probability - [x] Tests Closes rapidsai#4877 Closes rapidsai#4448 Authors: - Tarang Jain (https://github.com/tarang-jain) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4872

sudhanshu-shukla-git added ? - Needs Triage Need team to review and classify feature request New feature or request labels Dec 14, 2021

This was referenced Jan 12, 2022

[FEA] Support prediction on new data with HDBSCAN. #4472

Closed

[FEA] request for HDBSCAN clustering #1783

Closed

github-actions bot added the inactive-30d label Jan 13, 2022

github-actions bot removed the inactive-30d label Feb 28, 2022

github-actions bot added the inactive-30d label Mar 30, 2022

whymauri mentioned this issue Apr 13, 2022

[FEA] Approximate_predict HDBSCAN Support #4699

Closed

github-actions bot removed the inactive-30d label May 3, 2022

github-actions bot added the inactive-30d label Jun 2, 2022

github-actions bot removed the inactive-30d label Jun 2, 2022

beckernick mentioned this issue Jul 7, 2022

fit and transform not working with cuML MaartenGr/BERTopic#603

Closed

beckernick mentioned this issue Jul 29, 2022

cuML: AttributeError: predict MaartenGr/BERTopic#647

Closed

tarang-jain mentioned this issue Sep 2, 2022

[FEA] approximate_predict function for HDBSCAN #4872

Merged

5 tasks

rapids-bot bot closed this as completed in #4872 Sep 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Need a approximate_predict function for cuml HDBSCAN #4448

[FEA] Need a approximate_predict function for cuml HDBSCAN #4448

sudhanshu-shukla-git commented Dec 14, 2021 •

edited

Loading

github-actions bot commented Jan 13, 2022

sudhanshu-shukla-git commented Feb 28, 2022

github-actions bot commented Mar 30, 2022

whymauri commented May 2, 2022

github-actions bot commented Jun 2, 2022

sudhanshu-shukla-git commented Jun 2, 2022

cedivad commented Jun 17, 2022

whymauri commented Jun 17, 2022

osalem-l commented Jun 20, 2022

cedivad commented Jun 20, 2022

RaiAmanRai commented Aug 1, 2022 •

edited

Loading

ldsands commented Aug 3, 2022

cjnolet commented Aug 3, 2022 •

edited

Loading

DeepTitan commented Aug 22, 2022

[FEA] Need a approximate_predict function for cuml HDBSCAN #4448

[FEA] Need a approximate_predict function for cuml HDBSCAN #4448

Comments

sudhanshu-shukla-git commented Dec 14, 2021 • edited Loading

github-actions bot commented Jan 13, 2022

sudhanshu-shukla-git commented Feb 28, 2022

github-actions bot commented Mar 30, 2022

whymauri commented May 2, 2022

github-actions bot commented Jun 2, 2022

sudhanshu-shukla-git commented Jun 2, 2022

cedivad commented Jun 17, 2022

whymauri commented Jun 17, 2022

osalem-l commented Jun 20, 2022

cedivad commented Jun 20, 2022

RaiAmanRai commented Aug 1, 2022 • edited Loading

ldsands commented Aug 3, 2022

cjnolet commented Aug 3, 2022 • edited Loading

DeepTitan commented Aug 22, 2022

sudhanshu-shukla-git commented Dec 14, 2021 •

edited

Loading

RaiAmanRai commented Aug 1, 2022 •

edited

Loading

cjnolet commented Aug 3, 2022 •

edited

Loading