Update split_data with new default for timeseries #3650

eccabay · 2022-08-09T18:57:55Z

Reduces the default test_size in split_data to be 0.1 instead of 0.2, as long as the new value is greater than the passed in forecast horizon.

codecov · 2022-08-09T19:04:28Z

Codecov Report

Merging #3650 (874f81a) into main (994d33a) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3650     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        335     335             
  Lines      33790   33840     +50     
=======================================
+ Hits       33662   33712     +50     
  Misses       128     128

Impacted Files	Coverage Δ
evalml/preprocessing/utils.py	`100.0% <100.0%> (ø)`
evalml/tests/conftest.py	`97.9% <100.0%> (+0.1%)`	⬆️
...valml/tests/preprocessing_tests/test_split_data.py	`100.0% <100.0%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

chukarsten

LGTM, just added a few suggestions, take them or leave them. Nothing blocking!

chukarsten · 2022-08-10T16:35:28Z

docs/source/release_notes.rst

@@ -2,12 +2,13 @@ Release Notes
 -------------
 **Future Releases**
    * Enhancements
-        * Add ``exclude_featurizers`` parameter to ``AutoMLSearch`` to specify featurizers that should be excluded from all pipelines :pr:`3631`
+        * Added ``exclude_featurizers`` parameter to ``AutoMLSearch`` to specify featurizers that should be excluded from all pipelines :pr:`3631`


Make sure you send a picture of this to @angela97lin

chukarsten · 2022-08-12T14:42:27Z

evalml/tests/preprocessing_tests/test_split_data.py

+    if is_binary(problem_type):
+        X, y = X_y_binary
+    if is_multiclass(problem_type):
+        X, y = X_y_multi
+    if is_regression(problem_type):
+        X, y = X_y_regression
+    problem_configuration = None
+    if is_time_series(problem_type):
+        problem_configuration = {"gap": 1, "max_delay": 7, "time_index": "ts_data"}
+
+    X = make_data_type(data_type, X)
+    y = make_data_type(data_type, y)


I suspect this is boiler plate not just for these two tests but also across many of the tests. Did you want to refactor it into something that returns problem_configuration, X, and y given data_type, problem_type, etc? I am pretty sure I've seen this a few other places too. I think it's also fine to file an issue to call out the refactor with the modules we should refactor this code into.

I like this. A slight refactor to get_test_data_from_configuration works wonderfully to factor this out! We can go through and replace other instances in a future story.

evalml/tests/preprocessing_tests/test_split_data.py

eccabay added 2 commits August 9, 2022 14:56

Update split_data with new default

b205986

Update release notes

96b9d2e

eccabay marked this pull request as ready for review August 9, 2022 19:27

auto-assign bot assigned eccabay Aug 9, 2022

eccabay requested review from jeremyliweishih, chukarsten, christopherbunn, fjlanasa and MichaelFu512 and removed request for jeremyliweishih August 9, 2022 19:27

fjlanasa approved these changes Aug 9, 2022

View reviewed changes

chukarsten approved these changes Aug 12, 2022

View reviewed changes

eccabay added 3 commits August 12, 2022 11:36

Merge branch 'main' into 4658_ts-holdout

364535c

Test updates from PR comments

46b232e

Merge branch 'main' into 4658_ts-holdout

3549890

chukarsten mentioned this pull request Aug 15, 2022

Refactor Additional Tests to Use get_test_data_from_configuration() #3666

Open

chukarsten and others added 3 commits August 15, 2022 12:52

Merge branch 'main' into 4658_ts-holdout

ff9f83d

Merge branch 'main' into 4658_ts-holdout

0364a0f

Added the ability of the conftest fixture to generate numpy.

874f81a

jeremyliweishih approved these changes Aug 15, 2022

View reviewed changes

chukarsten enabled auto-merge (squash) August 15, 2022 18:17

chukarsten merged commit 4ebe97a into main Aug 15, 2022

chukarsten deleted the 4658_ts-holdout branch August 15, 2022 18:21

chukarsten mentioned this pull request Aug 16, 2022

Release v0.56.0 #3653

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update split_data with new default for timeseries #3650

Update split_data with new default for timeseries #3650

eccabay commented Aug 9, 2022

codecov bot commented Aug 9, 2022 •

edited

Loading

chukarsten left a comment

chukarsten Aug 10, 2022

chukarsten Aug 12, 2022

eccabay Aug 12, 2022

Update split_data with new default for timeseries #3650

Update split_data with new default for timeseries #3650

Conversation

eccabay commented Aug 9, 2022

codecov bot commented Aug 9, 2022 • edited Loading

Codecov Report

chukarsten left a comment

Choose a reason for hiding this comment

chukarsten Aug 10, 2022

Choose a reason for hiding this comment

chukarsten Aug 12, 2022

Choose a reason for hiding this comment

eccabay Aug 12, 2022

Choose a reason for hiding this comment

codecov bot commented Aug 9, 2022 •

edited

Loading