Add autologging for scikit-learn #3287

harupy · 2020-08-16T14:35:41Z

Signed-off-by: harupy [email protected]

What changes are proposed in this pull request?

Add autologging for scikit-learn

How is this patch tested?

unite tests

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, JavaScript, plotting
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: harupy <[email protected]>

mlflow/sklearn.py

tests/sklearn/test_sklearn_autolog.py

Signed-off-by: harupy <[email protected]>

mlflow/sklearn.py

Signed-off-by: harupy <[email protected]>

mlflow/sklearn.py

Signed-off-by: harupy <[email protected]>

dbczumar

@harupy This is awesome! Left some initial comments. Great work!

mlflow/sklearn.py

tests/sklearn/test_sklearn_autolog.py

mlflow/sklearn.py

Signed-off-by: harupy <[email protected]>

mlflow/sklearn/__init__.py

Signed-off-by: harupy <[email protected]>

harupy · 2020-08-22T02:45:58Z

.github/workflows/sklearn.yml

+    branches:
+      - master
+    paths:
+      - mlflow/sklearn/**


For a pull request, run this action only when files under mlflow/sklearn have changed.

.github/workflows/sklearn.yml

Signed-off-by: harupy <[email protected]>

harupy · 2020-08-24T08:57:17Z

I found that the doc for sklearn.utils.all_estimators says it does NOT return meta estimators, but it turns out this is a mistake (see this issue report I filed).

https://scikit-learn.org/stable/modules/generated/sklearn.utils.all_estimators.html

Get a list of all estimators from sklearn. This function crawls the module and gets all classes that inherit from BaseEstimator. Classes that are defined in test-modules are not included. By default meta_estimators such as GridSearchCV are also not included.

.github/workflows/sklearn.yml

* Add autologging for scikit-learn Signed-off-by: harupy <[email protected]> * Update sklearn's version Signed-off-by: harupy <[email protected]> * Remove unrelated file Signed-off-by: harupy <[email protected]> * rename Signed-off-by: harupy <[email protected]> * rename load_model Signed-off-by: harupy <[email protected]> * Remove blank line Signed-off-by: harupy <[email protected]> * Reorder imports Signed-off-by: harupy <[email protected]> * fix Signed-off-by: harupy <[email protected]> * DRY Signed-off-by: harupy <[email protected]> * Emit warning on older versions of sklearn Signed-off-by: harupy <[email protected]> * Use warnings.warn Signed-off-by: harupy <[email protected]> * Remove unused argument Signed-off-by: harupy <[email protected]> * Revert changes on requirements Signed-off-by: harupy <[email protected]> * Use LooseVersion Signed-off-by: harupy <[email protected]> * Fix _get_all_estimators Signed-off-by: harupy <[email protected]> * rename vars Signed-off-by: harupy <[email protected]> * Use backported all_estimators Signed-off-by: harupy <[email protected]> * Fix lint errors Signed-off-by: harupy <[email protected]> * Remove print Signed-off-by: harupy <[email protected]> * simplify code Signed-off-by: harupy <[email protected]> * Create sklearn directory Signed-off-by: harupy <[email protected]> * Move _all_estimators to utils Signed-off-by: harupy <[email protected]> * Remove link Signed-off-by: harupy <[email protected]> * fix Signed-off-by: harupy <[email protected]> * Fix active_run_exists' condition Signed-off-by: harupy <[email protected]> * Verify no children Signed-off-by: harupy <[email protected]> * Add experiment_id Signed-off-by: harupy <[email protected]> * Specify stacklevel Signed-off-by: harupy <[email protected]> * Wrap fit with try-except Signed-off-by: harupy <[email protected]> * Remove use_caplog Signed-off-by: harupy <[email protected]> * rename test Signed-off-by: harupy <[email protected]> * Remove temp_tracking_uri Signed-off-by: harupy <[email protected]> * Remove unused imports Signed-off-by: harupy <[email protected]> * Add docstring for _all_estimators Signed-off-by: harupy <[email protected]> * Fix assertions Signed-off-by: harupy <[email protected]> * Wrap score with try-except Signed-off-by: harupy <[email protected]> * fix Signed-off-by: harupy <[email protected]> * Add log assertion to test_autolog_marks_run_as_failed_when_fit_fails Signed-off-by: harupy <[email protected]> * indent Signed-off-by: harupy <[email protected]> * simplify code Signed-off-by: harupy <[email protected]> * Add failure reasons Signed-off-by: harupy <[email protected]> * Fix assertion order Signed-off-by: harupy <[email protected]> * Assert metrics is empty when score fails Signed-off-by: harupy <[email protected]> * minor fix Signed-off-by: harupy <[email protected]> * Assert after with Signed-off-by: harupy <[email protected]> * Use readable class name Signed-off-by: harupy <[email protected]> * Throw when fit fails Signed-off-by: harupy <[email protected]> * Rename active_run_exists to should_start_run Signed-off-by: harupy <[email protected]> * nit Signed-off-by: harupy <[email protected]> * fix if condition Signed-off-by: harupy <[email protected]> * pass sample_weight if both fit and score have it Signed-off-by: harupy <[email protected]> * Exclude property methods from patching Signed-off-by: harupy <[email protected]> * Chunk params to avoid hitting log_batch API limit Signed-off-by: harupy <[email protected]> * Fix args handling Signed-off-by: harupy <[email protected]> * Use all_estimators if sklearn.utils.all_estimators exists Signed-off-by: harupy <[email protected]> * Fix lint errors Signed-off-by: harupy <[email protected]> * Remove useless () Signed-off-by: harupy <[email protected]> * Fix test name Signed-off-by: harupy <[email protected]> * nit Signed-off-by: harupy <[email protected]> * Use model.fit if not parametrized Signed-off-by: harupy <[email protected]> * Temporarily add sklearn job Signed-off-by: harupy <[email protected]> * Add pytest Signed-off-by: harupy <[email protected]> * rerun Signed-off-by: harupy <[email protected]> * fix config Signed-off-by: harupy <[email protected]> * Fix install Signed-off-by: harupy <[email protected]> * do not run install-common.sh Signed-off-by: harupy <[email protected]> * Disable fail-fast Signed-off-by: harupy <[email protected]> * Remove sklearn.datasets Signed-off-by: harupy <[email protected]> * Add 0.22.2 Signed-off-by: harupy <[email protected]> * Try print_changed_only=True Signed-off-by: harupy <[email protected]> * Truncate dict value Signed-off-by: harupy <[email protected]> * Add test for value truncation Signed-off-by: harupy <[email protected]> * De-hardcode tests Signed-off-by: harupy <[email protected]> * Remove set_config Signed-off-by: harupy <[email protected]> * Use try_mlflow_log for mlflow.end_run Signed-off-by: harupy <[email protected]> * Use try-catch for _all_estimators Signed-off-by: harupy <[email protected]> * Fix waring message for scoring error Signed-off-by: harupy <[email protected]> * Mark autolog as experimental Signed-off-by: harupy <[email protected]> * Mark tests for autolog as large Signed-off-by: harupy <[email protected]> * Add test_fit_takes_Xy_as_keyword_arguments Signed-off-by: harupy <[email protected]> * Emit warning message when truncating key or value Signed-off-by: harupy <[email protected]> * De-hardcode x and y names Signed-off-by: harupy <[email protected]> * Fix mangled-signatures link Signed-off-by: harupy <[email protected]> * Add comments Signed-off-by: harupy <[email protected]> * Add doc for _get_args_for_score Signed-off-by: harupy <[email protected]> * Add large option Signed-off-by: harupy <[email protected]> * Apply truncation to expected dict Signed-off-by: harupy <[email protected]> * DRY Signed-off-by: harupy <[email protected]> * nit Signed-off-by: harupy <[email protected]> * De-hardcode sklearn version Signed-off-by: harupy <[email protected]> * Fix lint Signed-off-by: harupy <[email protected]> * De-hardcode model dir Signed-off-by: harupy <[email protected]> * Fix patch target Signed-off-by: harupy <[email protected]> * Use called_once_with Signed-off-by: harupy <[email protected]> * Add a new test case to test_both_fit_and_score_contain_sample_weight Signed-off-by: harupy <[email protected]> * nit Signed-off-by: harupy <[email protected]> * Remove unused function Signed-off-by: harupy <[email protected]> * nit Signed-off-by: harupy <[email protected]> * nit Signed-off-by: harupy <[email protected]> * DRY Signed-off-by: harupy <[email protected]> * Fix func order Signed-off-by: harupy <[email protected]> * Fix lint Signed-off-by: harupy <[email protected]> * Capitalize x Signed-off-by: harupy <[email protected]> * Override unbound methods Signed-off-by: harupy <[email protected]> * Introduce throw_if_try_mlflow_log_has_emitted_warnings fixture Signed-off-by: harupy <[email protected]> * Fix test_fit_takes_Xy_as_keyword_arguments Signed-off-by: harupy <[email protected]> * Add assertions for logged data Signed-off-by: harupy <[email protected]> * Add assert_called_once_with to test_call_fit_with_arguments_score_does_not_accept Signed-off-by: harupy <[email protected]> * Split Xy Signed-off-by: harupy <[email protected]> * Move log_metric to else clause Signed-off-by: harupy <[email protected]> * Fix lint errros Signed-off-by: harupy <[email protected]> * nit Signed-off-by: harupy <[email protected]> * Add docstring for autolog Signed-off-by: harupy <[email protected]> * Move pylint disable comments Signed-off-by: harupy <[email protected]> * nit Signed-off-by: harupy <[email protected]> * Fix fixture for try_mlflow_log Signed-off-by: harupy <[email protected]> * Add test_autolog_does_not_throw_when_mlflow_logging_fails Signed-off-by: harupy <[email protected]> * Fix lint Signed-off-by: harupy <[email protected]> * Replace key_is_none with val_is_none Signed-off-by: harupy <[email protected]> * Use MAX_ENTITY_KEY_LENGTH Signed-off-by: harupy <[email protected]> * Add comment for _MIN_SKLEARN_VERSION Signed-off-by: harupy <[email protected]> * Enhance comment for prop methods exclusion Signed-off-by: harupy <[email protected]> * Add todo for wrap & patch Signed-off-by: harupy <[email protected]> * bump sklearn Signed-off-by: harupy <[email protected]> * Update action config Signed-off-by: harupy <[email protected]> * Rename is_old_version Signed-off-by: harupy <[email protected]> * test _is_supported_version Signed-off-by: harupy <[email protected]> * Fix command Signed-off-by: harupy <[email protected]> * Fix _is_supported_version Signed-off-by: harupy <[email protected]> * Add continue-on-error Signed-off-by: harupy <[email protected]> * Emit a warning if test fail on an unsupported version Signed-off-by: harupy <[email protected]> * Add warning step Signed-off-by: harupy <[email protected]> * Fix workflow Signed-off-by: harupy <[email protected]> * Use set +e Signed-off-by: harupy <[email protected]> * debug Signed-off-by: harupy <[email protected]> * Fix condition Signed-off-by: harupy <[email protected]> * Simplify workflow Signed-off-by: harupy <[email protected]> * nit Signed-off-by: harupy <[email protected]> * Add comment on why include unsupported version Signed-off-by: harupy <[email protected]> * Update doc Signed-off-by: harupy <[email protected]> * black Signed-off-by: harupy <[email protected]> * nit Signed-off-by: harupy <[email protected]> * nit Signed-off-by: harupy <[email protected]> * fix comment Signed-off-by: harupy <[email protected]> * nit Signed-off-by: harupy <[email protected]> * Fix syntax Signed-off-by: harupy <[email protected]> * lint Signed-off-by: harupy <[email protected]>

Add autologging for scikit-learn

607dc1a

Signed-off-by: harupy <[email protected]>

github-actions bot added area/models MLmodel format, model serialization/deserialization, flavors area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs. labels Aug 16, 2020

harupy added 2 commits August 17, 2020 00:13

Update sklearn's version

3a3840e

Signed-off-by: harupy <[email protected]>

Remove unrelated file

ecf6197

Signed-off-by: harupy <[email protected]>

harupy marked this pull request as draft August 16, 2020 15:19

harupy added 4 commits August 17, 2020 00:26

rename

a474042

Signed-off-by: harupy <[email protected]>

rename load_model

da17372

Signed-off-by: harupy <[email protected]>

Remove blank line

f204506

Signed-off-by: harupy <[email protected]>

Reorder imports

5bbdd9c

Signed-off-by: harupy <[email protected]>

harupy commented Aug 16, 2020

View reviewed changes

mlflow/sklearn.py Show resolved Hide resolved

harupy commented Aug 16, 2020

View reviewed changes

tests/sklearn/test_sklearn_autolog.py Show resolved Hide resolved

harupy commented Aug 16, 2020

View reviewed changes

tests/sklearn/test_sklearn_autolog.py Outdated Show resolved Hide resolved

harupy added 2 commits August 17, 2020 07:36

fix

11ee815

Signed-off-by: harupy <[email protected]>

DRY

1f1ee0e

Signed-off-by: harupy <[email protected]>

harupy commented Aug 16, 2020

View reviewed changes

mlflow/sklearn.py Outdated Show resolved Hide resolved

harupy added 9 commits August 17, 2020 13:06

Emit warning on older versions of sklearn

86f8be1

Signed-off-by: harupy <[email protected]>

Use warnings.warn

1137171

Signed-off-by: harupy <[email protected]>

Remove unused argument

c0c8ad3

Signed-off-by: harupy <[email protected]>

Revert changes on requirements

52ab963

Signed-off-by: harupy <[email protected]>

Use LooseVersion

8b15725

Signed-off-by: harupy <[email protected]>

Fix _get_all_estimators

6f856f3

Signed-off-by: harupy <[email protected]>

rename vars

6d3674c

Signed-off-by: harupy <[email protected]>

Use backported all_estimators

981777e

Signed-off-by: harupy <[email protected]>

Fix lint errors

74eaee6

Signed-off-by: harupy <[email protected]>

harupy commented Aug 17, 2020

View reviewed changes

mlflow/sklearn.py Show resolved Hide resolved

harupy added 2 commits August 18, 2020 00:48

Remove print

8140d73

Signed-off-by: harupy <[email protected]>

simplify code

22985a1

Signed-off-by: harupy <[email protected]>

dbczumar requested changes Aug 17, 2020

View reviewed changes

Fix _is_supported_version

df8dcc1

Signed-off-by: harupy <[email protected]>

harupy commented Aug 22, 2020

View reviewed changes

mlflow/sklearn/__init__.py Show resolved Hide resolved

Add continue-on-error

f667658

Signed-off-by: harupy <[email protected]>

harupy commented Aug 22, 2020

View reviewed changes

.github/workflows/sklearn.yml Outdated Show resolved Hide resolved

harupy commented Aug 22, 2020

View reviewed changes

.github/workflows/sklearn.yml Outdated Show resolved Hide resolved

harupy added 17 commits August 22, 2020 12:14

Emit a warning if test fail on an unsupported version

5cf7e11

Signed-off-by: harupy <[email protected]>

Add warning step

6ed77a8

Signed-off-by: harupy <[email protected]>

Fix workflow

3275b42

Signed-off-by: harupy <[email protected]>

Use set +e

46fa236

Signed-off-by: harupy <[email protected]>

debug

a81f550

Signed-off-by: harupy <[email protected]>

Fix condition

0b48345

Signed-off-by: harupy <[email protected]>

Simplify workflow

678154d

Signed-off-by: harupy <[email protected]>

nit

ed220e0

Signed-off-by: harupy <[email protected]>

Add comment on why include unsupported version

c7ac426

Signed-off-by: harupy <[email protected]>

Update doc

cf37f0f

Signed-off-by: harupy <[email protected]>

black

23ab97a

Signed-off-by: harupy <[email protected]>

nit

716bf23

Signed-off-by: harupy <[email protected]>

nit

a425db9

Signed-off-by: harupy <[email protected]>

fix comment

56a076e

Signed-off-by: harupy <[email protected]>

nit

c59d273

Signed-off-by: harupy <[email protected]>

Fix syntax

f955b11

Signed-off-by: harupy <[email protected]>

lint

86a0085

Signed-off-by: harupy <[email protected]>

harupy commented Aug 25, 2020

View reviewed changes

.github/workflows/sklearn.yml Show resolved Hide resolved

harupy merged commit 77ebf43 into mlflow:master Aug 25, 2020

harupy deleted the add-sklearn-autolog branch August 25, 2020 01:21

dbczumar mentioned this pull request Aug 25, 2020

Scikit-learn autologging specializations for parameter search #3323

Merged

27 tasks

harupy mentioned this pull request Aug 28, 2020

Fix example code in docstring of mllfow.sklearn.autolog #3359

Merged

27 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add autologging for scikit-learn #3287

Add autologging for scikit-learn #3287

harupy commented Aug 16, 2020 •

edited by willzhan-db

Loading

dbczumar left a comment

harupy Aug 22, 2020 •

edited

Loading

harupy commented Aug 24, 2020 •

edited

Loading

Add autologging for scikit-learn #3287

Add autologging for scikit-learn #3287

Conversation

harupy commented Aug 16, 2020 • edited by willzhan-db Loading

What changes are proposed in this pull request?

How is this patch tested?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

dbczumar left a comment

Choose a reason for hiding this comment

harupy Aug 22, 2020 • edited Loading

Choose a reason for hiding this comment

harupy commented Aug 24, 2020 • edited Loading

harupy commented Aug 16, 2020 •

edited by willzhan-db

Loading

harupy Aug 22, 2020 •

edited

Loading

harupy commented Aug 24, 2020 •

edited

Loading