Make EvalML compatible with the new Woodwork Boolean inference #3892

ParthivNaresh · 2022-12-15T20:06:17Z

The Boolean inference has been revised in Woodwork and has been broken into two stages:

The first stage consists of removing the new automatic Boolean inference but maintaining the ability to transform columns into Boolean using this new inference by specifying the Boolean or BooleanNullable logical types. What does this mean? A column of 1s and 0s will not be inferred as Boolean through df.ww.init(), however specifying df.ww.init(logical_types={"col": "Boolean"}) will transform that column into True and False. This PR is solely to get EvalML compatible with this stage in woodwork==0.22.0.
The second stage is longer term and will be a cross team effort involving deeper compatibility changes in EvalML components and tests to prevent inference changes from being a blocker to future releases.

codecov · 2022-12-15T20:14:13Z

Codecov Report

Merging #3892 (a510467) into main (68b661f) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3892     +/-   ##
=======================================
- Coverage   99.7%   99.7%   -0.0%     
=======================================
  Files        346     346             
  Lines      36640   36665     +25     
=======================================
+ Hits       36510   36532     +22     
- Misses       130     133      +3

Impacted Files	Coverage Δ
evalml/data_checks/class_imbalance_data_check.py	`100.0% <100.0%> (ø)`
...components/transformers/imputers/simple_imputer.py	`98.4% <100.0%> (-1.6%)`	⬇️
evalml/pipelines/components/utils.py	`96.3% <100.0%> (ø)`
...valml/tests/component_tests/test_simple_imputer.py	`100.0% <100.0%> (ø)`
evalml/tests/component_tests/test_utils.py	`99.1% <100.0%> (ø)`
...ta_checks_tests/test_class_imbalance_data_check.py	`100.0% <100.0%> (ø)`
evalml/tests/utils_tests/test_woodwork_utils.py	`100.0% <100.0%> (ø)`
evalml/data_checks/target_leakage_data_check.py	`95.0% <0.0%> (-5.0%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

.github/meta.yaml

evalml/data_checks/class_imbalance_data_check.py

core-requirements.txt

evalml/data_checks/class_imbalance_data_check.py

christopherbunn

Mostly looks good! Have 1-2 q's but ready to approve once addressed.

I'm curious about is if we need to have a test case for the case you specified in the PR description? E.g. transforming a column of 0 and 1 into boolean values if explicitly set with df.ww.init(logical_types={"col": "Boolean"})

christopherbunn · 2022-12-20T18:54:19Z

evalml/data_checks/class_imbalance_data_check.py

+        after_to_before_inference_mapping = {
+            new: old for old, new in zip(original_vc.keys(), new_vc.keys())
+        }
+        before_to_after_inference_mapping = {
+            old: new for new, old in after_to_before_inference_mapping.items()
+        }
+        if str(y.ww.logical_type) not in ["Boolean", "BooleanNullable"]:
+            after_to_before_inference_mapping = {
+                old: old for old in after_to_before_inference_mapping.keys()
+            }


This logic is a bit convoluted but I think I follow it. Can we move lines 159 - 162 up to before 153 and set it as an if/else for the after_to_before_inference_mapping?

Sure thing, added some comments for clarity too

ParthivNaresh · 2022-12-20T20:57:44Z

@christopherbunn Added a test to verify Boolean behaviour!

christopherbunn

LGTM

evalml/data_checks/class_imbalance_data_check.py

.github/meta.yaml

initial commit

a7045ea

ParthivNaresh added 7 commits December 15, 2022 15:27

update set_boolean_columns_to_categorical

8650f7d

reference main

acbd3bf

int

88d6b3d

final change?

191f832

remove syserr

f5948bd

Merge branch 'main' into woodwork_boolean_compat

64d50f8

reference new woodwork 0.21.1

ee1bd43

ParthivNaresh marked this pull request as ready for review December 16, 2022 17:38

auto-assign bot assigned ParthivNaresh Dec 16, 2022

ParthivNaresh requested review from eccabay, jeremyliweishih, chukarsten, bchen1116, christopherbunn and Cmancuso December 16, 2022 18:58

Merge branch 'main' into woodwork_boolean_compat

6cceae0

bchen1116 reviewed Dec 19, 2022

View reviewed changes

.github/meta.yaml Outdated Show resolved Hide resolved

bchen1116 reviewed Dec 19, 2022

View reviewed changes

evalml/data_checks/class_imbalance_data_check.py Outdated Show resolved Hide resolved

changes

f755eac

bchen1116 reviewed Dec 19, 2022

View reviewed changes

core-requirements.txt Outdated Show resolved Hide resolved

bchen1116 reviewed Dec 19, 2022

View reviewed changes

evalml/data_checks/class_imbalance_data_check.py Outdated Show resolved Hide resolved

ParthivNaresh added 2 commits December 19, 2022 16:39

changes

3043040

core req

406f2d5

christopherbunn suggested changes Dec 20, 2022

View reviewed changes

changes

6e51401

christopherbunn approved these changes Dec 20, 2022

View reviewed changes

evalml/data_checks/class_imbalance_data_check.py Show resolved Hide resolved

eccabay approved these changes Dec 21, 2022

View reviewed changes

.github/meta.yaml Outdated Show resolved Hide resolved

ParthivNaresh added 4 commits December 21, 2022 12:41

update min woodwork to 0.21.1

36868c2

Merge branch 'main' into woodwork_boolean_compat

59218e7

Merge branch 'main' into woodwork_boolean_compat

9ffd721

Merge branch 'main' into woodwork_boolean_compat

a510467

ParthivNaresh merged commit de7faaf into main Dec 22, 2022

ParthivNaresh deleted the woodwork_boolean_compat branch December 22, 2022 17:04

christopherbunn mentioned this pull request Jan 3, 2023

Release v0.65.0 #3904

Merged

tamargrey mentioned this pull request Feb 15, 2023

Remove Nullable type logic from Imputer Components and Refactor #3999

Closed

tamargrey mentioned this pull request Mar 1, 2023

Refactor imputer components to remove unnecessary logic #4038

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make EvalML compatible with the new Woodwork Boolean inference #3892

Make EvalML compatible with the new Woodwork Boolean inference #3892

ParthivNaresh commented Dec 15, 2022 •

edited

Loading

codecov bot commented Dec 15, 2022 •

edited

Loading

christopherbunn left a comment •

edited

Loading

christopherbunn Dec 20, 2022

ParthivNaresh Dec 20, 2022

ParthivNaresh commented Dec 20, 2022

christopherbunn left a comment

Make EvalML compatible with the new Woodwork Boolean inference #3892

Make EvalML compatible with the new Woodwork Boolean inference #3892

Conversation

ParthivNaresh commented Dec 15, 2022 • edited Loading

codecov bot commented Dec 15, 2022 • edited Loading

Codecov Report

christopherbunn left a comment • edited Loading

Choose a reason for hiding this comment

christopherbunn Dec 20, 2022

Choose a reason for hiding this comment

ParthivNaresh Dec 20, 2022

Choose a reason for hiding this comment

ParthivNaresh commented Dec 20, 2022

christopherbunn left a comment

Choose a reason for hiding this comment

ParthivNaresh commented Dec 15, 2022 •

edited

Loading

codecov bot commented Dec 15, 2022 •

edited

Loading

christopherbunn left a comment •

edited

Loading