[REVIEW][PROPOSAL] Add tags and prefered memory order tags to estimators #3113

dantegd · 2020-11-04T05:10:47Z

While benchmarking the upcoming general SHAP implementations in cuML models, there is a non trivial penalty, both in memory and time, that occurs if data is generated in the opposite order that models require. This is also true of things like HPO and pipelines.

This PR adds the adoption of Scikit-learn tag system https://scikit-learn.org/stable/developers/develop.html#estimator-tags as well as adding cuML specific tags:

preferred_input_order - whether column or row major order input is preferred by the estimator
X_types_gpu - similar to X_types of the standard Scikit-lean tags, but for specifying acceptable input types to an algo.

…17-fea-pref-order

GPUtester · 2020-11-04T05:13:01Z

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

codecov-io · 2020-11-04T06:40:53Z

Codecov Report

Merging #3113 (17d0ff6) into branch-0.17 (b205e8f) will increase coverage by 0.26%.
The diff coverage is 96.26%.

@@               Coverage Diff               @@
##           branch-0.17    #3113      +/-   ##
===============================================
+ Coverage        70.68%   70.94%   +0.26%     
===============================================
  Files              197      197              
  Lines            15564    16092     +528     
===============================================
+ Hits             11001    11417     +416     
- Misses            4563     4675     +112

Impacted Files	Coverage Δ
python/cuml/neighbors/nearest_neighbors.pyx	`90.45% <ø> (ø)`
python/cuml/neighbors/kneighbors_classifier.pyx	`94.64% <33.33%> (-1.69%)`	⬇️
python/cuml/neighbors/kneighbors_regressor.pyx	`92.75% <33.33%> (-2.71%)`	⬇️
python/cuml/cluster/dbscan.pyx	`100.00% <100.00%> (ø)`
python/cuml/cluster/kmeans.pyx	`81.81% <100.00%> (+0.27%)`	⬆️
python/cuml/common/base.pyx	`80.34% <100.00%> (+4.46%)`	⬆️
python/cuml/decomposition/pca.pyx	`92.19% <100.00%> (+0.14%)`	⬆️
python/cuml/decomposition/tsvd.pyx	`97.05% <100.00%> (+0.06%)`	⬆️
python/cuml/ensemble/randomforestclassifier.pyx	`75.47% <100.00%> (+2.11%)`	⬆️
python/cuml/ensemble/randomforestregressor.pyx	`73.56% <100.00%> (+3.22%)`	⬆️
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b205e8f...17d0ff6. Read the comment docs.

JohnZed

I really like the idea! Two things:

Not sure it should be documented in every single init, whic seems repetitive
We have one currently frustrating exception that RF has different preferred inptus for fit and predict due to usage of FIL (row-wise)... I would love to change that in the future but it's not there yet.

python/cuml/solvers/qn.pyx

JohnZed · 2020-11-05T03:42:39Z

Also, we should add a note about this to the estimator guide in #3040 when both are in.

mdemoret-nv · 2020-11-06T19:58:20Z

@dantegd Is this similar to SkLearn's estimator tags? If so, it might be better to do a larger tag system similar to their design (or at least make this PR compatible with the tag system design). I know many of our tests could benefit from the sklearn tag system (and we would have less tests that are hardcoded to skip particular estimators).

…bute

dantegd · 2020-11-08T01:32:21Z

@mdemoret-nv thanks for the comment! I wasn't aware of the tag addition in Scikit 0.21, so this was immensely helpful to know. I think their implementation is very solid and very useful for our needs, at least for my purposes here with the order attribute.

Right now just added preferred_input_order for the discussion, which is the tag that will imediately be used, and then we can discuss more tags as the need arises.

@JohnZed @mdemoret-nv thoughts?

python/cuml/linear_model/elastic_net.pyx

… tag

JohnZed

Looks good! Needs a simple test. And I think the preferred order may be off for kneighbors classifier? Otherwise great

python/cuml/common/base.pyx

python/cuml/linear_model/elastic_net.pyx

JohnZed · 2020-11-11T23:14:33Z

python/cuml/ensemble/randomforestregressor.pyx

+
+    def _more_tags(self):
+        return {
+            'preferred_input_order': 'F'


F for fit, C for predict (awful, I know)... does that meet the tag definition?

that's an excellent question, I guess for estimators that have discrepancies like this we should leave it as None, what do you think?

python/cuml/neighbors/kneighbors_regressor.pyx

cjnolet

Looks great. Found only minor things, mostly in the estimator guide.

cjnolet · 2020-11-18T01:31:57Z

python/cuml/manifold/t_sne.pyx

+
+    def _more_tags(self):
+        return {
+            'preferred_input_order': 'C'


Just leaving a small note (more for myself) that t_sne & UMAP both could probably accept 'F' now that the underlying KNN prim can accept it.

cjnolet · 2020-11-18T01:37:02Z

python/cuml/neighbors/kneighbors_classifier.pyx

+    def _more_tags(self):
+        return {
+            # fit and predict require conflicting memory layouts
+            'preferred_input_order': None


I think this could be fixed but I'll need to look into it. Created #3153

cjnolet · 2020-11-18T01:37:46Z

wiki/python/ESTIMATOR_GUIDE.md

@@ -4,15 +4,36 @@ This guide is meant to help developers follow the correct patterns when creating

 **Note:** This guide is long, because it includes internal details on how cuML manages input and output types for advanced use cases. But for the vast majority of estimators, the requirements are very simple and can follow the example patterns shown below in the [Quick Start Guide](#quick-start-guide).

+## Table of Contents
+
+- [Recommended Scikit-Learn Documentation](#recommended-scikit-learn-documentation)


Oooh, I like this

wiki/python/ESTIMATOR_GUIDE.md

cjnolet · 2020-11-18T01:40:35Z

wiki/python/ESTIMATOR_GUIDE.md

+         ]
+   ```
+
+7. Implement `_more_tags()` if any of the [default tags]() need to be overriden for the new estimator:


Should this be linking to something?

wiki/python/ESTIMATOR_GUIDE.md

cjnolet

Found one more small thing as I investigated #3153. I'm going to go ahead and close that issue.

cjnolet · 2020-11-18T04:12:20Z

python/cuml/neighbors/kneighbors_regressor.pyx

+    def _more_tags(self):
+        return {
+            # fit and predict require conflicting memory layouts
+            'preferred_input_order': None


I looked more closely at the kneighbors variants and they do actually require order='F' as input. We can safely set this F here and in the kneighbors_classifier.

Co-authored-by: Corey J. Nolet <[email protected]>

cjnolet

Changes LGTM!

…-fea-pref-order

dantegd · 2020-11-19T21:55:49Z

rerun tests

dantegd · 2020-11-20T02:01:51Z

rerun tests

dantegd added 2 commits November 3, 2020 23:05

FEA Add preferred_order class parameter to linear models

39e94a4

Merge branch 'branch-0.17' of https://github.com/rapidsai/cuml into 0…

bc99ec6

…17-fea-pref-order

dantegd added 3 - Ready for Review Ready for review by team proposal Change current process or code Cython / Python Cython or Python issue labels Nov 4, 2020

dantegd requested a review from a team as a code owner November 4, 2020 05:10

JohnZed reviewed Nov 4, 2020

View reviewed changes

python/cuml/solvers/qn.pyx Outdated Show resolved Hide resolved

dantegd added 2 commits November 7, 2020 19:23

ENH adopt tags from scikit-learn API to support preferred order attri…

7e84f8c

…bute

DOC remove attribute docstrings

564ae51

dantegd added 2 commits November 7, 2020 19:34

FIX Change straggling classes

86830dc

FIX Change straggling classes

d6d8a51

dantegd commented Nov 8, 2020

View reviewed changes

python/cuml/linear_model/elastic_net.pyx Show resolved Hide resolved

FIX Add missing self

c85acea

dantegd changed the title ~~[REVIEW][PROPOSAL] Add prefered memory order class attribute~~ [REVIEW][PROPOSAL] Add tags and prefered memory order tags to estimators Nov 9, 2020

dantegd mentioned this pull request Nov 9, 2020

Compatibility with scikit-learn API #3125

Open

dantegd added 5 commits November 9, 2020 09:29

FIX straggling attribute

2f42eaa

ENH Add device data tag for proposal

bb8e2f7

FEA Add all scikit-learn API tags to base and improve gpu input types…

9c5b63f

… tag

FEA Add preferred_order tag to cluster models

340a5e2

FEA Add preferred_order tag to most models

c66ad00

JohnZed requested changes Nov 11, 2020

View reviewed changes

dantegd added 2 commits November 14, 2020 15:42

Merge branch-0.17

0a892d1

ENH Improvements and PR review feedback

fe6efb7

dantegd added 2 commits November 14, 2020 17:36

DOC add tag documentation to estimator guide

4e5bb3f

DOC add scikit link

e0cc4f9

cjnolet requested changes Nov 18, 2020

View reviewed changes

Update wiki/python/ESTIMATOR_GUIDE.md

fe117db

Co-authored-by: Corey J. Nolet <[email protected]>

dantegd changed the title ~~[REVIEW][PROPOSAL] Add tags and prefered memory order tags to estimators~~ [REVIEW][PROPOSAL] Add tags and prefered memory order tags to estimators [skip-ci] Nov 18, 2020

dantegd and others added 7 commits November 18, 2020 17:34

Update wiki/python/ESTIMATOR_GUIDE.md

520a4c4

Co-authored-by: Corey J. Nolet <[email protected]>

Update wiki/python/ESTIMATOR_GUIDE.md

c3a9b41

Co-authored-by: Corey J. Nolet <[email protected]>

Update wiki/python/ESTIMATOR_GUIDE.md

637dde3

Co-authored-by: Corey J. Nolet <[email protected]>

Update wiki/python/ESTIMATOR_GUIDE.md

a9ba498

Co-authored-by: Corey J. Nolet <[email protected]>

ENH Rename test_fit to test_api and add tags tests

44e698a

FIX fixes from PR review

5689ef8

DOC Added entry to changelog

d713c89

dantegd added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 3 - Ready for Review Ready for review by team labels Nov 19, 2020

Merge branch 'branch-0.17' into 017-fea-pref-order

1b02eac

cjnolet approved these changes Nov 19, 2020

View reviewed changes

dantegd changed the title ~~[REVIEW][PROPOSAL] Add tags and prefered memory order tags to estimators [skip-ci]~~ [REVIEW][PROPOSAL] Add tags and prefered memory order tags to estimators Nov 19, 2020

dantegd added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Waiting on Reviewer Waiting for reviewer to review or respond labels Nov 19, 2020

dantegd added 2 commits November 19, 2020 12:44

FIX PEP8 fixes

5fb7ef1

Merge branch '017-fea-pref-order' of github.com:dantegd/cuml into 017…

17d0ff6

…-fea-pref-order

JohnZed approved these changes Nov 20, 2020

View reviewed changes

dantegd merged commit b3e4827 into rapidsai:branch-0.17 Nov 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW][PROPOSAL] Add tags and prefered memory order tags to estimators #3113

[REVIEW][PROPOSAL] Add tags and prefered memory order tags to estimators #3113

dantegd commented Nov 4, 2020 •

edited

Loading

GPUtester commented Nov 4, 2020

codecov-io commented Nov 4, 2020 •

edited

Loading

JohnZed left a comment

JohnZed commented Nov 5, 2020

mdemoret-nv commented Nov 6, 2020

dantegd commented Nov 8, 2020

JohnZed left a comment

JohnZed Nov 11, 2020

dantegd Nov 12, 2020

cjnolet left a comment

cjnolet Nov 18, 2020

cjnolet Nov 18, 2020

cjnolet Nov 18, 2020

cjnolet Nov 18, 2020

cjnolet left a comment •

edited

Loading

cjnolet Nov 18, 2020

cjnolet left a comment

dantegd commented Nov 19, 2020

dantegd commented Nov 20, 2020

[REVIEW][PROPOSAL] Add tags and prefered memory order tags to estimators #3113

[REVIEW][PROPOSAL] Add tags and prefered memory order tags to estimators #3113

Conversation

dantegd commented Nov 4, 2020 • edited Loading

GPUtester commented Nov 4, 2020

codecov-io commented Nov 4, 2020 • edited Loading

Codecov Report

JohnZed left a comment

Choose a reason for hiding this comment

JohnZed commented Nov 5, 2020

mdemoret-nv commented Nov 6, 2020

dantegd commented Nov 8, 2020

JohnZed left a comment

Choose a reason for hiding this comment

JohnZed Nov 11, 2020

Choose a reason for hiding this comment

dantegd Nov 12, 2020

Choose a reason for hiding this comment

cjnolet left a comment

Choose a reason for hiding this comment

cjnolet Nov 18, 2020

Choose a reason for hiding this comment

cjnolet Nov 18, 2020

Choose a reason for hiding this comment

cjnolet Nov 18, 2020

Choose a reason for hiding this comment

cjnolet Nov 18, 2020

Choose a reason for hiding this comment

cjnolet left a comment • edited Loading

Choose a reason for hiding this comment

cjnolet Nov 18, 2020

Choose a reason for hiding this comment

cjnolet left a comment

Choose a reason for hiding this comment

dantegd commented Nov 19, 2020

dantegd commented Nov 20, 2020

dantegd commented Nov 4, 2020 •

edited

Loading

codecov-io commented Nov 4, 2020 •

edited

Loading

cjnolet left a comment •

edited

Loading