-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] All points membership vector for HDBSCAN #4800
[FEA] All points membership vector for HDBSCAN #4800
Conversation
tarang-jain
commented
Jul 5, 2022
•
edited
Loading
edited
- All points distance membership vector
- All points outlier membership vector
- All points probability in some cluster
- All points membership vector
- Tests
b823a4b
to
dbb7d48
Compare
cpp/include/cuml/cluster/hdbscan.hpp
Outdated
*/ | ||
void hdbscan(const raft::handle_t& handle, | ||
const float* X, | ||
size_t m, | ||
size_t n, | ||
raft::distance::DistanceType metric, | ||
HDBSCAN::Common::HDBSCANParams& params, | ||
HDBSCAN::Common::hdbscan_output<int, float>& out); | ||
HDBSCAN::Common::hdbscan_output<int, float>& out, | ||
bool prediction_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not add prediction_data
to params
? Ideally we'd use std::optional
for prediction_data_
cpp/src/hdbscan/hdbscan.cu
Outdated
exemplar_label_offsets.resize(n_selected_clusters_ + 1, handle.get_stream()); | ||
deaths.resize(n_clusters, handle.get_stream()); | ||
selected_clusters.resize(n_selected_clusters, handle.get_stream()); | ||
raft::copy(exemplar_idx.begin(), exemplar_idx_, n_exemplars_, handle.get_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is expensive in both memory usage and runtime. Rather than accepting raw pointers in cache
, I would suggest creating an allocate()
function that will resize the device_uvector
members and then using the getters for the underlying pointers inside build_prediction_data
to have the computations populate the values directly. That would remove the additional memory usage.
cpp/src/hdbscan/runner.h
Outdated
@@ -255,6 +258,15 @@ void _fit_hdbscan(const raft::handle_t& handle, | |||
* starting at 0 even in the presence of noise (-1) | |||
*/ | |||
|
|||
if (prediction_data) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if(params.prediction_data)
or you could also do if(prediction_data_.has_value())
if you use std::optional
.
cpp/test/sg/hdbscan_test.cu
Outdated
|
||
params = ::testing::TestWithParam<SoftClusteringInputs<T, IdxT>>::GetParam(); | ||
|
||
Logger::get().setLevel(CUML_LEVEL_DEBUG); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably remove this
rerun tests |
…points-membership-vector-hdbscan
…:tarang-jain/cuml into fea-all-points-membership-vector-hdbscan
…points-membership-vector-hdbscan
Codecov Report
@@ Coverage Diff @@
## branch-22.10 #4800 +/- ##
================================================
+ Coverage 78.02% 78.04% +0.01%
================================================
Files 180 180
Lines 11385 11422 +37
================================================
+ Hits 8883 8914 +31
- Misses 2502 2508 +6
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@gpucibot merge |
- [x] All points distance membership vector - [x] All points outlier membership vector - [x] All points probability in some cluster - [x] All points membership vector - [x] Tests Authors: - Tarang Jain (https://github.com/tarang-jain) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4800
- [x] All points distance membership vector - [x] All points outlier membership vector - [x] All points probability in some cluster - [x] All points membership vector - [x] Tests Authors: - Tarang Jain (https://github.com/tarang-jain) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #4800
- [x] All points distance membership vector - [x] All points outlier membership vector - [x] All points probability in some cluster - [x] All points membership vector - [x] Tests Authors: - Tarang Jain (https://github.com/tarang-jain) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4800