Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make EvalML compatible with the new Woodwork Boolean inference #3892

Merged
merged 17 commits into from
Dec 22, 2022

Conversation

ParthivNaresh
Copy link
Contributor

@ParthivNaresh ParthivNaresh commented Dec 15, 2022

The Boolean inference has been revised in Woodwork and has been broken into two stages:

  1. The first stage consists of removing the new automatic Boolean inference but maintaining the ability to transform columns into Boolean using this new inference by specifying the Boolean or BooleanNullable logical types. What does this mean? A column of 1s and 0s will not be inferred as Boolean through df.ww.init(), however specifying df.ww.init(logical_types={"col": "Boolean"}) will transform that column into True and False. This PR is solely to get EvalML compatible with this stage in woodwork==0.22.0.
  2. The second stage is longer term and will be a cross team effort involving deeper compatibility changes in EvalML components and tests to prevent inference changes from being a blocker to future releases.

@codecov
Copy link

codecov bot commented Dec 15, 2022

Codecov Report

Merging #3892 (a510467) into main (68b661f) will decrease coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3892     +/-   ##
=======================================
- Coverage   99.7%   99.7%   -0.0%     
=======================================
  Files        346     346             
  Lines      36640   36665     +25     
=======================================
+ Hits       36510   36532     +22     
- Misses       130     133      +3     
Impacted Files Coverage Δ
evalml/data_checks/class_imbalance_data_check.py 100.0% <100.0%> (ø)
...components/transformers/imputers/simple_imputer.py 98.4% <100.0%> (-1.6%) ⬇️
evalml/pipelines/components/utils.py 96.3% <100.0%> (ø)
...valml/tests/component_tests/test_simple_imputer.py 100.0% <100.0%> (ø)
evalml/tests/component_tests/test_utils.py 99.1% <100.0%> (ø)
...ta_checks_tests/test_class_imbalance_data_check.py 100.0% <100.0%> (ø)
evalml/tests/utils_tests/test_woodwork_utils.py 100.0% <100.0%> (ø)
evalml/data_checks/target_leakage_data_check.py 95.0% <0.0%> (-5.0%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

.github/meta.yaml Outdated Show resolved Hide resolved
core-requirements.txt Outdated Show resolved Hide resolved
Copy link
Contributor

@christopherbunn christopherbunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good! Have 1-2 q's but ready to approve once addressed.

I'm curious about is if we need to have a test case for the case you specified in the PR description? E.g. transforming a column of 0 and 1 into boolean values if explicitly set with df.ww.init(logical_types={"col": "Boolean"})

Comment on lines 153 to 162
after_to_before_inference_mapping = {
new: old for old, new in zip(original_vc.keys(), new_vc.keys())
}
before_to_after_inference_mapping = {
old: new for new, old in after_to_before_inference_mapping.items()
}
if str(y.ww.logical_type) not in ["Boolean", "BooleanNullable"]:
after_to_before_inference_mapping = {
old: old for old in after_to_before_inference_mapping.keys()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is a bit convoluted but I think I follow it. Can we move lines 159 - 162 up to before 153 and set it as an if/else for the after_to_before_inference_mapping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, added some comments for clarity too

@ParthivNaresh
Copy link
Contributor Author

@christopherbunn Added a test to verify Boolean behaviour!

Copy link
Contributor

@christopherbunn christopherbunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

.github/meta.yaml Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants