-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protect balanced k-means out-of-memory in some cases #1161
Protect balanced k-means out-of-memory in some cases #1161
Conversation
…when the mesoclusters turn out to be unbalanced
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Artem for the PR, this is a good workaround for the imbalanced training issue. LGTM.
for (IdxT j = 0; j < n_rows; j++) { | ||
if (labels_mptr[j] == (LabelT)i) { mc_trainset_ids[k++] = j; } | ||
for (IdxT j = 0; j < n_rows && k < mesocluster_size_max; j++) { | ||
if (labels_mptr[j] == LabelT(i)) { mc_trainset_ids[k++] = j; } | ||
} | ||
if (k != mesocluster_sizes[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we omit warning when k
is limited due to mesocluster_size_max
.
if (k != mesocluster_sizes[i]) | |
if (k != mesocluster_sizes[i] && k < mesocluster_size_max) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, it's better to leave the warning, because it also gives an extra information about which clusters are the offending clusters and emphasizes the importance of the previous warning about the unbalanced mesoclusters. In the logs, they go one after another and it's rather easy to see the link between the two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Artem for the PR, this is a good workaround for the imbalanced training issue. LGTM.
Codecov ReportBase: 87.99% // Head: 87.99% // No change to project coverage 👍
Additional details and impacted files@@ Coverage Diff @@
## branch-23.02 #1161 +/- ##
=============================================
Coverage 87.99% 87.99%
=============================================
Files 21 21
Lines 483 483
=============================================
Hits 425 425
Misses 58 58 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
/merge |
There's no guarantee that our balanced k-means implementation always produces balanced clusters. In the first stage, when mesoclusters are trained, the biggest cluster can grow larger than half of all input data. This becomes a problem at the second stage, when in `build_fine_clusters`, the mesocluster data is copied in a temporary buffer. If size is too big, there may be not enough memory on the device. A quick workaround: 1. Expand the error reporting (RAFT_LOG_WARN) 2. Artificially limit the mesocluster size in the event of highly unbalanced clustering Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: rapidsai#1161
There's no guarantee that our balanced k-means implementation always produces balanced clusters. In the first stage, when mesoclusters are trained, the biggest cluster can grow larger than half of all input data. This becomes a problem at the second stage, when in
build_fine_clusters
, the mesocluster data is copied in a temporary buffer. If size is too big, there may be not enough memory on the device. A quick workaround: