Improve infrastructure for experimental dispatching of non existing methods in cuML #6148

dantegd · 2024-11-27T01:21:59Z

This PR adds methods in UniversalBase, so that cuML estimators that inherit from it can enable better errors, and experimentally dispatch to other libraries (sklearn, umap, hdbscan...) for methods that haven't been implemented in cuML itself.

… cuML estimators

python/cuml/cuml/internals/base.pyx

betatim · 2024-11-27T07:37:29Z

I think the only bigger picture question for this PR is: should we just implement the missing methods instead of forwarding? It would probably benefit all cuml users, be more straightforward (measure in less dunder usage :D), but it is a lot more work.

We can decide in either direction, but wanted to bring it up so that we briefly talk about it and then decide.

viclafargue

LGTM, I think that the dispatching mechanism should work well :). However, what worries me a bit is that it is not a guarantee that Scikit-Learn functions will all work fine just following attributes transfers. There seems to be a lot of edge cases, and I believe that this would require some more rigorous testing. Ideally, if we want to guarantee functionality we would have to whitelist the functions that work fine for each estimator.

viclafargue · 2024-11-27T10:13:52Z

python/cuml/cuml/internals/base.pyx

+        and creates one if necessary.
+        """
+        if not hasattr(self, "_cpu_model"):
+            self.import_cpu_model()


I think that the call to the import_cpu_model function is not necessary here as we already checked the presence of the _cpu_model_class attribute earlier.

betatim · 2024-11-27T13:07:48Z

Maybe something to add is a test that iterates overall cuml estimators and their scikit-learn equivalent and checks all attrs exist. We could extend it to instantiate each estimator, fit it and then repeat the check.

betatim · 2024-11-28T08:17:17Z

For the "iterate over all estimators" part: I wrote something to do that for #6107 and am wanting to re-use it for #4753 (iterate all estimators, then filter those that accept random_state=). Should we make a separate PR that adds this functionality that we can merge before all of these and this PR?

dantegd · 2024-12-02T01:56:07Z

@betatim added the pytests, but for now it was easier to manually list the estimators that support interoperability by inheriting from UniversalBase, we probably should revisit that soon for #6107 indeed, but for now I think we can keep the PRs independent.

betatim · 2024-12-02T15:06:14Z

Yeah I think we need a separate PR to build the infrastructure for discovering estimators, etc. So fine for me to not do that here.

Why do we need _experimental_dispatching as a way to turn this on? Couldn't it be on by default?

dantegd · 2024-12-03T02:57:02Z

@betatim because this could prove to be a change that could break some third party libraries in ways I might not expect or might be unexpected for users, I think it should be opt-in while more extensive testing is done at least for a version cycle.

copy-pr-bot · 2024-12-10T04:42:42Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

dantegd · 2024-12-10T04:43:33Z

/ok to test

…id potential infinite recursion in hdbscan

python/cuml/cuml/internals/base.pyx

python/cuml/cuml/tests/test_public_methods_attributes.py

…nto fix-universalbase-methods

dantegd · 2024-12-12T15:07:58Z

/merge

dantegd added 3 commits November 26, 2024 19:16

ENH Improve infrastructure for dispatching of non existing methods in…

70822b7

… cuML estimators

FIX accidentally deleted line

ca2a785

FIX style fixes

7142958

github-actions bot added the Cython / Python Cython or Python issue label Nov 27, 2024

betatim reviewed Nov 27, 2024

View reviewed changes

python/cuml/cuml/internals/base.pyx Outdated Show resolved Hide resolved

viclafargue approved these changes Nov 27, 2024

View reviewed changes

dantegd added 2 commits December 1, 2024 19:50

FIX returning the correct value and add pytests

13196ac

FIX style fixes

7080509

dantegd added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 2, 2024

bdice and others added 4 commits December 6, 2024 17:06

Update cuda-python lower bounds to 12.6.2 / 11.8.5.

e78a0fc

FIX cuml dask fixes to unblock CI

25e77ae

Merge branch 'fix-dask-2502' into fix-universalbase-methods

b395924

FIX hdbscan interop fix

5995f37

github-actions bot added conda conda issue CMake CUDA/C++ labels Dec 10, 2024

dantegd changed the base branch from branch-24.12 to branch-25.02 December 10, 2024 04:43

FIX HDBSCAN interop fix and improve getattr for efficiency and to avo…

fdf6e5a

…id potential infinite recursion in hdbscan

github-actions bot removed CMake CUDA/C++ labels Dec 11, 2024

dantegd marked this pull request as ready for review December 11, 2024 15:43

dantegd requested review from a team as code owners December 11, 2024 15:43

dantegd requested review from bdice and betatim December 11, 2024 15:43

Merge branch 'branch-25.02' into fix-universalbase-methods

c8255e5

github-actions bot removed the conda conda issue label Dec 11, 2024

wphicks reviewed Dec 11, 2024

View reviewed changes

python/cuml/cuml/internals/base.pyx Show resolved Hide resolved

python/cuml/cuml/tests/test_public_methods_attributes.py Outdated Show resolved Hide resolved

dantegd added 2 commits December 11, 2024 17:35

FIX remove not needed method name check in pytest

68bdf6a

Merge branch 'fix-universalbase-methods' of github.com:dantegd/cuml i…

e1a6a0d

…nto fix-universalbase-methods

rapids-bot bot merged commit 7580d7c into rapidsai:branch-25.02 Dec 12, 2024
61 of 62 checks passed

wphicks approved these changes Dec 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve infrastructure for experimental dispatching of non existing methods in cuML #6148

Improve infrastructure for experimental dispatching of non existing methods in cuML #6148

dantegd commented Nov 27, 2024

betatim commented Nov 27, 2024

viclafargue left a comment

viclafargue Nov 27, 2024

betatim commented Nov 27, 2024

betatim commented Nov 28, 2024

dantegd commented Dec 2, 2024

betatim commented Dec 2, 2024

dantegd commented Dec 3, 2024

copy-pr-bot bot commented Dec 10, 2024

dantegd commented Dec 10, 2024

dantegd commented Dec 12, 2024

Improve infrastructure for experimental dispatching of non existing methods in cuML #6148

Improve infrastructure for experimental dispatching of non existing methods in cuML #6148

Conversation

dantegd commented Nov 27, 2024

betatim commented Nov 27, 2024

viclafargue left a comment

Choose a reason for hiding this comment

viclafargue Nov 27, 2024

Choose a reason for hiding this comment

betatim commented Nov 27, 2024

betatim commented Nov 28, 2024

dantegd commented Dec 2, 2024

betatim commented Dec 2, 2024

dantegd commented Dec 3, 2024

copy-pr-bot bot commented Dec 10, 2024

dantegd commented Dec 10, 2024

dantegd commented Dec 12, 2024