Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HELP-REQ] Expose KMeans init_plus_plus in pylibraft #1198

Merged

Conversation

betatim
Copy link
Member

@betatim betatim commented Jan 27, 2023

Closes #1166

This adds init_plus_plus to pylibraft.

Three things to discuss:

  1. can we come up with a better name?
  2. what size should the workspace be? I couldn't quite work out if the size mattered or not.
  3. I tried to add a @property.setter to KMeansParams so that I can set the seed after constructing it. But that leads to an error from cython(?). Does anyone know more about how to do this?

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this PR @betatim! Your impl looks quite clean and concise. Just a couple very small things.

python/pylibraft/pylibraft/test/test_kmeans.py Outdated Show resolved Hide resolved
python/pylibraft/pylibraft/cluster/kmeans.pyx Outdated Show resolved Hide resolved
raft::device_matrix_view<double, int> centroids)
{
// XXX what should the size of this be?
rmm::device_uvector<char> workspace(10, handle.get_stream());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a good size for this temporary workspace? Does it even matter or is it anyway resized to the size that is needed inside the function(s) that use it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we have been working to remove the temporary workspace arguments from our public APIs in favor of using the "workspace_resource" in the device_resources instance. However, we can keep this one the way it is in the meantime until we scrape through and clean it up. The device_uvector is an RAII compliant (owns the memory, can resize it, and frees it when destructed) so you can just pass in a 0 size workspace and it'll get resized accordingly inside the init_plus_plus function.

@betatim betatim force-pushed the add-init-plus-plus-pylibraft branch from 95ea557 to 2a8de87 Compare February 1, 2023 08:34
@cjnolet cjnolet added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 1, 2023
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending CI

@cjnolet cjnolet requested a review from a team as a code owner February 3, 2023 20:41
@github-actions github-actions bot added the ci label Feb 3, 2023
@cjnolet cjnolet changed the base branch from branch-23.02 to branch-23.04 February 4, 2023 01:31
@github-actions github-actions bot removed the ci label Feb 4, 2023
@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-23.04@88cb31d). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff               @@
##             branch-23.04    #1198   +/-   ##
===============================================
  Coverage                ?   87.99%           
===============================================
  Files                   ?       21           
  Lines                   ?      483           
  Branches                ?        0           
===============================================
  Hits                    ?      425           
  Misses                  ?       58           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@cjnolet
Copy link
Member

cjnolet commented Feb 4, 2023

/merge

@rapids-bot rapids-bot bot merged commit a39237c into rapidsai:branch-23.04 Feb 4, 2023
@betatim betatim deleted the add-init-plus-plus-pylibraft branch February 6, 2023 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python
Projects
Development

Successfully merging this pull request may close these issues.

[FEA] Expose KMeans cluster initialisation to Python
3 participants