Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add autologging for scikit-learn #3287

Merged
merged 148 commits into from
Aug 25, 2020
Merged

Conversation

harupy
Copy link
Member

@harupy harupy commented Aug 16, 2020

Signed-off-by: harupy [email protected]

What changes are proposed in this pull request?

Add autologging for scikit-learn

How is this patch tested?

unite tests

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/projects: MLproject format, project running backends
  • area/scoring: Local serving, model deployment tools, spark UDFs
  • area/server-infra: MLflow server, JavaScript dev server
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, JavaScript, plotting
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

@github-actions github-actions bot added area/models MLmodel format, model serialization/deserialization, flavors area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs. labels Aug 16, 2020
@harupy harupy marked this pull request as draft August 16, 2020 15:19
harupy added 4 commits August 17, 2020 00:26
Signed-off-by: harupy <[email protected]>
Signed-off-by: harupy <[email protected]>
Signed-off-by: harupy <[email protected]>
Signed-off-by: harupy <[email protected]>
harupy added 2 commits August 17, 2020 07:36
Signed-off-by: harupy <[email protected]>
Signed-off-by: harupy <[email protected]>
mlflow/sklearn.py Outdated Show resolved Hide resolved
harupy added 2 commits August 18, 2020 00:48
Signed-off-by: harupy <[email protected]>
Signed-off-by: harupy <[email protected]>
Copy link
Collaborator

@dbczumar dbczumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harupy This is awesome! Left some initial comments. Great work!

mlflow/sklearn.py Outdated Show resolved Hide resolved
mlflow/sklearn.py Outdated Show resolved Hide resolved
mlflow/sklearn.py Outdated Show resolved Hide resolved
mlflow/sklearn.py Outdated Show resolved Hide resolved
mlflow/sklearn.py Show resolved Hide resolved
mlflow/sklearn.py Outdated Show resolved Hide resolved
tests/sklearn/test_sklearn_autolog.py Show resolved Hide resolved
tests/sklearn/test_sklearn_autolog.py Outdated Show resolved Hide resolved
mlflow/sklearn.py Outdated Show resolved Hide resolved
mlflow/sklearn.py Outdated Show resolved Hide resolved
branches:
- master
paths:
- mlflow/sklearn/**
Copy link
Member Author

@harupy harupy Aug 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a pull request, run this action only when files under mlflow/sklearn have changed.

.github/workflows/sklearn.yml Outdated Show resolved Hide resolved
.github/workflows/sklearn.yml Outdated Show resolved Hide resolved
harupy added 17 commits August 22, 2020 12:14
Signed-off-by: harupy <[email protected]>
Signed-off-by: harupy <[email protected]>
Signed-off-by: harupy <[email protected]>
Signed-off-by: harupy <[email protected]>
Signed-off-by: harupy <[email protected]>
Signed-off-by: harupy <[email protected]>
Signed-off-by: harupy <[email protected]>
Signed-off-by: harupy <[email protected]>
Signed-off-by: harupy <[email protected]>
Signed-off-by: harupy <[email protected]>
Signed-off-by: harupy <[email protected]>
Signed-off-by: harupy <[email protected]>
Signed-off-by: harupy <[email protected]>
Signed-off-by: harupy <[email protected]>
Signed-off-by: harupy <[email protected]>
@harupy
Copy link
Member Author

harupy commented Aug 24, 2020

I found that the doc for sklearn.utils.all_estimators says it does NOT return meta estimators, but it turns out this is a mistake (see this issue report I filed).

https://scikit-learn.org/stable/modules/generated/sklearn.utils.all_estimators.html

Get a list of all estimators from sklearn. This function crawls the module and gets all classes that inherit from BaseEstimator. Classes that are defined in test-modules are not included. By default meta_estimators such as GridSearchCV are also not included.

@harupy harupy merged commit 77ebf43 into mlflow:master Aug 25, 2020
@harupy harupy deleted the add-sklearn-autolog branch August 25, 2020 01:21
dbczumar pushed a commit to dbczumar/mlflow that referenced this pull request Aug 27, 2020
* Add autologging for scikit-learn

Signed-off-by: harupy <[email protected]>

* Update sklearn's version

Signed-off-by: harupy <[email protected]>

* Remove unrelated file

Signed-off-by: harupy <[email protected]>

* rename

Signed-off-by: harupy <[email protected]>

* rename load_model

Signed-off-by: harupy <[email protected]>

* Remove blank line

Signed-off-by: harupy <[email protected]>

* Reorder imports

Signed-off-by: harupy <[email protected]>

* fix

Signed-off-by: harupy <[email protected]>

* DRY

Signed-off-by: harupy <[email protected]>

* Emit warning on older versions of sklearn

Signed-off-by: harupy <[email protected]>

* Use warnings.warn

Signed-off-by: harupy <[email protected]>

* Remove unused argument

Signed-off-by: harupy <[email protected]>

* Revert changes on requirements

Signed-off-by: harupy <[email protected]>

* Use LooseVersion

Signed-off-by: harupy <[email protected]>

* Fix _get_all_estimators

Signed-off-by: harupy <[email protected]>

* rename vars

Signed-off-by: harupy <[email protected]>

* Use backported all_estimators

Signed-off-by: harupy <[email protected]>

* Fix lint errors

Signed-off-by: harupy <[email protected]>

* Remove print

Signed-off-by: harupy <[email protected]>

* simplify code

Signed-off-by: harupy <[email protected]>

* Create sklearn directory

Signed-off-by: harupy <[email protected]>

* Move _all_estimators to utils

Signed-off-by: harupy <[email protected]>

* Remove link

Signed-off-by: harupy <[email protected]>

* fix

Signed-off-by: harupy <[email protected]>

* Fix active_run_exists' condition

Signed-off-by: harupy <[email protected]>

* Verify no children

Signed-off-by: harupy <[email protected]>

* Add experiment_id

Signed-off-by: harupy <[email protected]>

* Specify stacklevel

Signed-off-by: harupy <[email protected]>

* Wrap fit with try-except

Signed-off-by: harupy <[email protected]>

* Remove use_caplog

Signed-off-by: harupy <[email protected]>

* rename test

Signed-off-by: harupy <[email protected]>

* Remove temp_tracking_uri

Signed-off-by: harupy <[email protected]>

* Remove unused imports

Signed-off-by: harupy <[email protected]>

* Add docstring for _all_estimators

Signed-off-by: harupy <[email protected]>

* Fix assertions

Signed-off-by: harupy <[email protected]>

* Wrap score with try-except

Signed-off-by: harupy <[email protected]>

* fix

Signed-off-by: harupy <[email protected]>

* Add log assertion to test_autolog_marks_run_as_failed_when_fit_fails

Signed-off-by: harupy <[email protected]>

* indent

Signed-off-by: harupy <[email protected]>

* simplify code

Signed-off-by: harupy <[email protected]>

* Add failure reasons

Signed-off-by: harupy <[email protected]>

* Fix assertion order

Signed-off-by: harupy <[email protected]>

* Assert metrics is empty when score fails

Signed-off-by: harupy <[email protected]>

* minor fix

Signed-off-by: harupy <[email protected]>

* Assert after with

Signed-off-by: harupy <[email protected]>

* Use readable class name

Signed-off-by: harupy <[email protected]>

* Throw when fit fails

Signed-off-by: harupy <[email protected]>

* Rename active_run_exists to should_start_run

Signed-off-by: harupy <[email protected]>

* nit

Signed-off-by: harupy <[email protected]>

* fix if condition

Signed-off-by: harupy <[email protected]>

* pass sample_weight if both fit and score have it

Signed-off-by: harupy <[email protected]>

* Exclude property methods from patching

Signed-off-by: harupy <[email protected]>

* Chunk params to avoid hitting log_batch API limit

Signed-off-by: harupy <[email protected]>

* Fix args handling

Signed-off-by: harupy <[email protected]>

* Use all_estimators if sklearn.utils.all_estimators exists

Signed-off-by: harupy <[email protected]>

* Fix lint errors

Signed-off-by: harupy <[email protected]>

* Remove useless ()

Signed-off-by: harupy <[email protected]>

* Fix test name

Signed-off-by: harupy <[email protected]>

* nit

Signed-off-by: harupy <[email protected]>

* Use model.fit if not parametrized

Signed-off-by: harupy <[email protected]>

* Temporarily add sklearn job

Signed-off-by: harupy <[email protected]>

* Add pytest

Signed-off-by: harupy <[email protected]>

* rerun

Signed-off-by: harupy <[email protected]>

* fix config

Signed-off-by: harupy <[email protected]>

* Fix install

Signed-off-by: harupy <[email protected]>

* do not run install-common.sh

Signed-off-by: harupy <[email protected]>

* Disable fail-fast

Signed-off-by: harupy <[email protected]>

* Remove sklearn.datasets

Signed-off-by: harupy <[email protected]>

* Add 0.22.2

Signed-off-by: harupy <[email protected]>

* Try print_changed_only=True

Signed-off-by: harupy <[email protected]>

* Truncate dict value

Signed-off-by: harupy <[email protected]>

* Add test for value truncation

Signed-off-by: harupy <[email protected]>

* De-hardcode tests

Signed-off-by: harupy <[email protected]>

* Remove set_config

Signed-off-by: harupy <[email protected]>

* Use try_mlflow_log for mlflow.end_run

Signed-off-by: harupy <[email protected]>

* Use try-catch for _all_estimators

Signed-off-by: harupy <[email protected]>

* Fix waring message for scoring error

Signed-off-by: harupy <[email protected]>

* Mark autolog as experimental

Signed-off-by: harupy <[email protected]>

* Mark tests for autolog as large

Signed-off-by: harupy <[email protected]>

* Add test_fit_takes_Xy_as_keyword_arguments

Signed-off-by: harupy <[email protected]>

* Emit warning message when truncating key or value

Signed-off-by: harupy <[email protected]>

* De-hardcode x and y names

Signed-off-by: harupy <[email protected]>

* Fix mangled-signatures link

Signed-off-by: harupy <[email protected]>

* Add comments

Signed-off-by: harupy <[email protected]>

* Add doc for _get_args_for_score

Signed-off-by: harupy <[email protected]>

* Add large option

Signed-off-by: harupy <[email protected]>

* Apply truncation to expected dict

Signed-off-by: harupy <[email protected]>

* DRY

Signed-off-by: harupy <[email protected]>

* nit

Signed-off-by: harupy <[email protected]>

* De-hardcode sklearn version

Signed-off-by: harupy <[email protected]>

* Fix lint

Signed-off-by: harupy <[email protected]>

* De-hardcode model dir

Signed-off-by: harupy <[email protected]>

* Fix patch target

Signed-off-by: harupy <[email protected]>

* Use called_once_with

Signed-off-by: harupy <[email protected]>

* Add a new test case to test_both_fit_and_score_contain_sample_weight

Signed-off-by: harupy <[email protected]>

* nit

Signed-off-by: harupy <[email protected]>

* Remove unused function

Signed-off-by: harupy <[email protected]>

* nit

Signed-off-by: harupy <[email protected]>

* nit

Signed-off-by: harupy <[email protected]>

* DRY

Signed-off-by: harupy <[email protected]>

* Fix func order

Signed-off-by: harupy <[email protected]>

* Fix lint

Signed-off-by: harupy <[email protected]>

* Capitalize x

Signed-off-by: harupy <[email protected]>

* Override unbound methods

Signed-off-by: harupy <[email protected]>

* Introduce throw_if_try_mlflow_log_has_emitted_warnings fixture

Signed-off-by: harupy <[email protected]>

* Fix test_fit_takes_Xy_as_keyword_arguments

Signed-off-by: harupy <[email protected]>

* Add assertions for logged data

Signed-off-by: harupy <[email protected]>

* Add assert_called_once_with to test_call_fit_with_arguments_score_does_not_accept

Signed-off-by: harupy <[email protected]>

* Split Xy

Signed-off-by: harupy <[email protected]>

* Move log_metric to else clause

Signed-off-by: harupy <[email protected]>

* Fix lint errros

Signed-off-by: harupy <[email protected]>

* nit

Signed-off-by: harupy <[email protected]>

* Add docstring for autolog

Signed-off-by: harupy <[email protected]>

* Move pylint disable comments

Signed-off-by: harupy <[email protected]>

* nit

Signed-off-by: harupy <[email protected]>

* Fix fixture for try_mlflow_log

Signed-off-by: harupy <[email protected]>

* Add test_autolog_does_not_throw_when_mlflow_logging_fails

Signed-off-by: harupy <[email protected]>

* Fix lint

Signed-off-by: harupy <[email protected]>

* Replace key_is_none with val_is_none

Signed-off-by: harupy <[email protected]>

* Use MAX_ENTITY_KEY_LENGTH

Signed-off-by: harupy <[email protected]>

* Add comment for _MIN_SKLEARN_VERSION

Signed-off-by: harupy <[email protected]>

* Enhance comment for prop methods exclusion

Signed-off-by: harupy <[email protected]>

* Add todo for wrap & patch

Signed-off-by: harupy <[email protected]>

* bump sklearn

Signed-off-by: harupy <[email protected]>

* Update action config

Signed-off-by: harupy <[email protected]>

* Rename is_old_version

Signed-off-by: harupy <[email protected]>

* test _is_supported_version

Signed-off-by: harupy <[email protected]>

* Fix command

Signed-off-by: harupy <[email protected]>

* Fix _is_supported_version

Signed-off-by: harupy <[email protected]>

* Add continue-on-error

Signed-off-by: harupy <[email protected]>

* Emit a warning if test fail on an unsupported version

Signed-off-by: harupy <[email protected]>

* Add warning step

Signed-off-by: harupy <[email protected]>

* Fix workflow

Signed-off-by: harupy <[email protected]>

* Use set +e

Signed-off-by: harupy <[email protected]>

* debug

Signed-off-by: harupy <[email protected]>

* Fix condition

Signed-off-by: harupy <[email protected]>

* Simplify workflow

Signed-off-by: harupy <[email protected]>

* nit

Signed-off-by: harupy <[email protected]>

* Add comment on why include unsupported version

Signed-off-by: harupy <[email protected]>

* Update doc

Signed-off-by: harupy <[email protected]>

* black

Signed-off-by: harupy <[email protected]>

* nit

Signed-off-by: harupy <[email protected]>

* nit

Signed-off-by: harupy <[email protected]>

* fix comment

Signed-off-by: harupy <[email protected]>

* nit

Signed-off-by: harupy <[email protected]>

* Fix syntax

Signed-off-by: harupy <[email protected]>

* lint

Signed-off-by: harupy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/models MLmodel format, model serialization/deserialization, flavors area/tracking Tracking service, tracking client APIs, autologging rn/feature Mention under Features in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants