-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Statically set woodwork typing in tests #3697
Merged
Merged
Changes from 19 commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
e22954b
Remove unnecessary ww init
eccabay a0c38d9
Replace ts_data with get_ts_X_y in all cases
eccabay b7598d8
Explicitly set ww types in get_ts_X_y
eccabay 30acb05
Update imputer_test_data to set ww types
eccabay 50768db
downcast_nullable_types works for dataframe and series
eccabay f1cbddf
Rename get_ts_X_y and cols for ease of use
eccabay 39f2db2
Update time series featurizer tests for ww and have ts featurizer exp…
eccabay 9a24f19
Update data check tests to explicitly set typing
eccabay 9b74832
Update X_y_binary/multi/regression to be ww instead of numpy
eccabay df08067
Update X_y_categorical_classification/regression to init ww
eccabay 008967f
Merge branch 'main' into 3651_ww_hardening
eccabay 927cf52
Update release notes
eccabay aa88b20
Add test for downcast_nullable_types
eccabay c5e16f7
Fix downcast_nullable_types and test
eccabay 2063785
lint fix
eccabay 6f6defc
Small updates to reduce merge conflicts with ww 0.18.0 upgrade
eccabay c6a1c47
Fix a few missing changes
eccabay c35252f
Merge branch 'main' into 3651_ww_hardening
eccabay c26e508
Merge branch 'main' into 3651_ww_hardening
eccabay e7134e8
lint fix
eccabay 6ca9142
more lint
eccabay eaad897
Merge branch 'main' into 3651_ww_hardening
chukarsten fd108a3
Merge branch 'main' into 3651_ww_hardening
chukarsten 8b7cc18
Merge branch 'main' into 3651_ww_hardening
eccabay 56eb074
Update downcast_nullable_types for series consistency
eccabay File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1944,15 +1944,13 @@ def test_percent_better_than_baseline_in_rankings( | |
dummy_classifier_estimator_class, | ||
dummy_regressor_estimator_class, | ||
dummy_time_series_regressor_estimator_class, | ||
ts_data_binary, | ||
ts_data, | ||
X_y_multi, | ||
): | ||
if not objective.is_defined_for_problem_type(problem_type_value): | ||
pytest.skip("Skipping because objective is not defined for problem type") | ||
|
||
X, y = ts_data_binary | ||
if problem_type_value == ProblemTypes.MULTICLASS: | ||
X, y = X_y_multi | ||
X, _, y = ts_data(problem_type=problem_type_value) | ||
|
||
estimator = { | ||
ProblemTypes.BINARY: dummy_classifier_estimator_class, | ||
|
@@ -2031,12 +2029,7 @@ class Pipeline2(DummyPipeline): | |
max_iterations=3, | ||
objective=objective, | ||
additional_objectives=[], | ||
problem_configuration={ | ||
"time_index": "date", | ||
"gap": 0, | ||
"max_delay": 0, | ||
"forecast_horizon": 2, | ||
}, | ||
problem_configuration=pipeline_parameters["pipeline"], | ||
train_best_pipeline=False, | ||
n_jobs=1, | ||
) | ||
|
@@ -2134,9 +2127,9 @@ def test_percent_better_than_baseline_computed_for_all_objectives( | |
dummy_classifier_estimator_class, | ||
dummy_regressor_estimator_class, | ||
dummy_time_series_regressor_estimator_class, | ||
ts_data_binary, | ||
ts_data, | ||
): | ||
X, y = ts_data_binary | ||
X, _, y = ts_data(problem_type=problem_type) | ||
|
||
problem_type_enum = handle_problem_types(problem_type) | ||
|
||
|
@@ -2270,7 +2263,7 @@ def fit(self, *args, **kwargs): | |
|
||
|
||
def test_time_series_regression_with_parameters(ts_data): | ||
X, y = ts_data | ||
X, _, y = ts_data() | ||
X.index.name = "date" | ||
problem_configuration = { | ||
"time_index": "date", | ||
|
@@ -2879,8 +2872,7 @@ def test_automl_woodwork_user_types_preserved( | |
|
||
|
||
def test_automl_validates_problem_configuration(ts_data): | ||
_, y = ts_data | ||
X = pd.DataFrame(pd.date_range("2020-10-01", "2020-10-31"), columns=["Date"]) | ||
X, _, y = ts_data() | ||
assert ( | ||
AutoMLSearch(X_train=X, y_train=y, problem_type="binary").problem_configuration | ||
== {} | ||
|
@@ -2937,14 +2929,14 @@ def test_automl_validates_problem_configuration(ts_data): | |
y_train=y, | ||
problem_type="time series regression", | ||
problem_configuration={ | ||
"time_index": "Date", | ||
"time_index": "date", | ||
"max_delay": 2, | ||
"gap": 3, | ||
"forecast_horizon": 2, | ||
}, | ||
).problem_configuration | ||
assert problem_config == { | ||
"time_index": "Date", | ||
"time_index": "date", | ||
"max_delay": 2, | ||
"gap": 3, | ||
"forecast_horizon": 2, | ||
|
@@ -3076,7 +3068,7 @@ def test_automl_rerun(AutoMLTestEnv, X_y_binary, caplog): | |
|
||
def test_timeseries_baseline_init_with_correct_gap_max_delay(AutoMLTestEnv, ts_data): | ||
|
||
X, y = ts_data | ||
X, _, y = ts_data() | ||
automl = AutoMLSearch( | ||
X_train=X, | ||
y_train=y, | ||
|
@@ -4035,9 +4027,13 @@ def test_automl_baseline_pipeline_predictions_and_scores_time_series(problem_typ | |
baseline.fit(X_train, y_train) | ||
|
||
expected_predictions = y.shift(1)[4:] | ||
expected_predictions = expected_predictions.astype("int64") | ||
expected_predictions = expected_predictions | ||
if problem_type != ProblemTypes.TIME_SERIES_REGRESSION: | ||
expected_predictions = pd.Series(expected_predictions, name="target_delay_1") | ||
expected_predictions = pd.Series( | ||
expected_predictions, | ||
name="target_delay_1", | ||
dtype="int64", | ||
) | ||
|
||
preds = baseline.predict(X_validation, None, X_train, y_train) | ||
pd.testing.assert_series_equal(expected_predictions, preds) | ||
|
@@ -4133,10 +4129,8 @@ def test_automl_thresholding_train_pipelines(mock_objective, threshold, X_y_bina | |
def test_automl_drop_unknown_columns(columns, AutoMLTestEnv, X_y_binary, caplog): | ||
caplog.clear() | ||
X, y = X_y_binary | ||
X = pd.DataFrame(X) | ||
for col in columns: | ||
X[col] = pd.Series(range(len(X))) | ||
X.ww.init() | ||
X.ww[col] = pd.Series(range(len(X))) | ||
X.ww.set_types({col: "Unknown" for col in columns}) | ||
automl = AutoMLSearch( | ||
X_train=X, | ||
|
@@ -4534,26 +4528,18 @@ def test_baseline_pipeline_properly_initalized( | |
@pytest.mark.parametrize( | ||
"problem_type", | ||
[ | ||
ProblemTypes.TIME_SERIES_REGRESSION, | ||
ProblemTypes.TIME_SERIES_MULTICLASS, | ||
ProblemTypes.TIME_SERIES_BINARY, | ||
"time series regression", | ||
"time series multiclass", | ||
"time series binary", | ||
], | ||
) | ||
def test_automl_passes_known_in_advance_pipeline_parameters_to_all_pipelines( | ||
problem_type, | ||
ts_data_binary, | ||
ts_data_multi, | ||
ts_data, | ||
AutoMLTestEnv, | ||
): | ||
if problem_type == ProblemTypes.TIME_SERIES_MULTICLASS: | ||
X, y = ts_data_multi | ||
elif problem_type == ProblemTypes.TIME_SERIES_BINARY: | ||
X, y = ts_data_binary | ||
else: | ||
X, y = ts_data | ||
Comment on lines
-4560
to
-4565
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice refactoring 👍 |
||
X, _, y = ts_data(problem_type=problem_type) | ||
|
||
X.ww.init() | ||
X.ww["email"] = pd.Series(["[email protected]"] * X.shape[0], index=X.index) | ||
X.ww["category"] = pd.Series(["a"] * X.shape[0], index=X.index) | ||
X.ww.set_types({"email": "EmailAddress", "category": "Categorical"}) | ||
|
@@ -4588,7 +4574,7 @@ def test_automl_passes_known_in_advance_pipeline_parameters_to_all_pipelines( | |
lambda d: d["Not Known In Advance Pipeline - Select Columns Transformer"][ | ||
"columns" | ||
] | ||
== ["features", "date"], | ||
== ["feature", "date"], | ||
).all() | ||
|
||
|
||
|
@@ -4633,10 +4619,10 @@ def test_cv_validation_scores( | |
|
||
|
||
def test_cv_validation_scores_time_series( | ||
ts_data_binary, | ||
ts_data, | ||
AutoMLTestEnv, | ||
): | ||
X, y = ts_data_binary | ||
X, _, y = ts_data(problem_type="time series binary") | ||
problem_configuration = { | ||
"time_index": "date", | ||
"gap": 0, | ||
|
@@ -4678,7 +4664,7 @@ def test_search_parameters_held_automl( | |
algorithm, | ||
batches, | ||
X_y_binary, | ||
ts_data_binary, | ||
ts_data, | ||
): | ||
if problem_type == "binary": | ||
X, y = X_y_binary | ||
|
@@ -4695,7 +4681,7 @@ def test_search_parameters_held_automl( | |
}, | ||
} | ||
else: | ||
X, y = ts_data_binary | ||
X, _, y = ts_data(problem_type="time series binary") | ||
problem_configuration = { | ||
"time_index": "date", | ||
"gap": 0, | ||
|
@@ -4781,9 +4767,9 @@ def test_automl_accepts_features( | |
AutoMLTestEnv, | ||
): | ||
X, y = X_y_binary | ||
X_pd = pd.DataFrame(X) | ||
X_pd.columns = X_pd.columns.astype(str) | ||
X_transform = X_pd.iloc[len(X) // 3 :] | ||
X = pd.DataFrame(X) # Drop ww information since setting column types fails | ||
X.columns = X.columns.astype(str) | ||
X_transform = X.iloc[len(X) // 3 :] | ||
|
||
if features == "with_features_provided": | ||
es = ft.EntitySet() | ||
|
@@ -4839,7 +4825,7 @@ def test_automl_with_iterative_algorithm_puts_ts_estimators_first( | |
is_using_windows, | ||
): | ||
|
||
X, y = ts_data | ||
X, _, y = ts_data() | ||
|
||
env = AutoMLTestEnv("time series regression") | ||
automl = AutoMLSearch( | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not confident in the correctness of this change, as to how we handle all types and if everything does in fact become a double here. If anyone knows better about this, please let me know.