[FEA] Kmeans auto-find K #825

cjnolet · 2022-09-14T00:53:34Z

Some important workflows require the ability to auto-find k using a measure of residual (spread of point distances across all centroids) and dispersion (spread of centroids in relation to each other).

This requires an objective which maximizes the cluster to cluster distances while minimizing the point to cluster spread as much as possible. We should be able to do this fairly easily, especially with our new consolidated k-means implementations.

dantegd · 2022-09-14T20:06:11Z

Note: there was an issue opened by @jeaton32 a while ago in cuML with some code: rapidsai/cuml#818

github-actions · 2022-10-14T21:02:09Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@jeaton32

This is a port of rapidsai/cuml#818 (originally from NVGraph) which uses the Calinski-Harabasz score to find the optimal value of k. Todo: - [x] create histogram of cluster sizes - [x] add googletests - [x] expose public API Closes #825 cc @jeaton32 Authors: - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Ben Frederickson (https://github.com/benfred) URL: #1070

cjnolet added the feature request New feature or request label Sep 14, 2022

cjnolet added this to VS/ML/DM Primitives Release Board Oct 12, 2022

github-actions bot added the inactive-30d label Oct 14, 2022

cjnolet mentioned this issue Dec 6, 2022

Initial port of auto-find-k #1070

Merged

3 tasks

rapids-bot bot closed this as completed in #1070 Feb 21, 2023

github-project-automation bot moved this to Done in VS/ML/DM Primitives Release Board Feb 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Kmeans auto-find K #825

[FEA] Kmeans auto-find K #825

cjnolet commented Sep 14, 2022

dantegd commented Sep 14, 2022

github-actions bot commented Oct 14, 2022

[FEA] Kmeans auto-find K #825

[FEA] Kmeans auto-find K #825

Comments

cjnolet commented Sep 14, 2022

dantegd commented Sep 14, 2022

github-actions bot commented Oct 14, 2022