Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TASK] Post HDBSCAN merge tasks #3879

Open
15 of 21 tasks
divyegala opened this issue May 20, 2021 · 6 comments
Open
15 of 21 tasks

[TASK] Post HDBSCAN merge tasks #3879

divyegala opened this issue May 20, 2021 · 6 comments
Assignees
Labels
feature request New feature or request inactive-90d

Comments

@divyegala
Copy link
Member

divyegala commented May 20, 2021

These are tasks for cuML's HDBSCAN implementation after 21.06 release

  • Move HDBSCAN out of experimental (this can be done after sections 1, 2, and 3 below are complete).

1. Necessary tech debt / cleanup (e.g. need to have)

2. Testing / Correctness verification

3. Test failures / bugs (e.g. must have)

4. Additional tech debt / cleanup (e.g. nice to have)

  • Some arrays are being used as int instead of bool due to inter-op issues between host and device bool. Update these
  • Investigate potential parallelization of do_labelling()

5. External

6. Additional features before blog

7. Additional features (e.g. like to have)

  • Add outlier scores
  • Sparse inputs
  • Fuzzy clustering
@divyegala divyegala added feature request New feature or request ? - Needs Triage Need team to review and classify labels May 20, 2021
@divyegala divyegala removed the ? - Needs Triage Need team to review and classify label May 20, 2021
rapids-bot bot pushed a commit that referenced this issue Jun 16, 2021
…lity scores (#3987)

Addresses section 1 of #3879

Authors:
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Divye Gala (https://github.com/divyegala)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #3987
@cjnolet
Copy link
Member

cjnolet commented Jun 23, 2021

Linking #3997

@cjnolet
Copy link
Member

cjnolet commented Sep 1, 2021

HDBSCAN has officially moved out of experimental!

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this issue Oct 9, 2023
@KukumavMozolo
Copy link

H there, i would be very much interested in sparse inputs being supported, is this planned?

@beckernick
Copy link
Member

Hi @KukumavMozolo, thanks for reviving this issue! Would you be able to share any info about what kinds of use cases this might enable for you that don't currently work well with dense inputs?

@KukumavMozolo
Copy link

KukumavMozolo commented Jun 17, 2024

Hi @beckernick,
Currently I am working on crashreport deduplication. Essentially this entails transforming various device metrics like memory consumption, cpu utilization but also crashlogs like calltraces and register content into a very high dimensional vectorspace that is also very sparse. Crashreport-deduplication is than the process of finding Clusters in that vector space representing sources of errors.
Obtaining a dense representation of this kind of data seems difficult to me since slight variations of input e.g. a slightly different stacktrace can be a qualitative different source of error making it hard to compress. Yet still the number of possible errors at a given point in time is comparably small.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request inactive-90d
Projects
None yet
Development

No branches or pull requests

4 participants