Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update split_data with new default for timeseries #3650

Merged
merged 8 commits into from
Aug 15, 2022
Merged

Conversation

eccabay
Copy link
Contributor

@eccabay eccabay commented Aug 9, 2022

Reduces the default test_size in split_data to be 0.1 instead of 0.2, as long as the new value is greater than the passed in forecast horizon.

@codecov
Copy link

codecov bot commented Aug 9, 2022

Codecov Report

Merging #3650 (874f81a) into main (994d33a) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3650     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        335     335             
  Lines      33790   33840     +50     
=======================================
+ Hits       33662   33712     +50     
  Misses       128     128             
Impacted Files Coverage Δ
evalml/preprocessing/utils.py 100.0% <100.0%> (ø)
evalml/tests/conftest.py 97.9% <100.0%> (+0.1%) ⬆️
...valml/tests/preprocessing_tests/test_split_data.py 100.0% <100.0%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@eccabay eccabay marked this pull request as ready for review August 9, 2022 19:27
Copy link
Contributor

@chukarsten chukarsten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just added a few suggestions, take them or leave them. Nothing blocking!

@@ -2,12 +2,13 @@ Release Notes
-------------
**Future Releases**
* Enhancements
* Add ``exclude_featurizers`` parameter to ``AutoMLSearch`` to specify featurizers that should be excluded from all pipelines :pr:`3631`
* Added ``exclude_featurizers`` parameter to ``AutoMLSearch`` to specify featurizers that should be excluded from all pipelines :pr:`3631`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure you send a picture of this to @angela97lin

Comment on lines 74 to 85
if is_binary(problem_type):
X, y = X_y_binary
if is_multiclass(problem_type):
X, y = X_y_multi
if is_regression(problem_type):
X, y = X_y_regression
problem_configuration = None
if is_time_series(problem_type):
problem_configuration = {"gap": 1, "max_delay": 7, "time_index": "ts_data"}

X = make_data_type(data_type, X)
y = make_data_type(data_type, y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this is boiler plate not just for these two tests but also across many of the tests. Did you want to refactor it into something that returns problem_configuration, X, and y given data_type, problem_type, etc? I am pretty sure I've seen this a few other places too. I think it's also fine to file an issue to call out the refactor with the modules we should refactor this code into.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. A slight refactor to get_test_data_from_configuration works wonderfully to factor this out! We can go through and replace other instances in a future story.

evalml/tests/preprocessing_tests/test_split_data.py Outdated Show resolved Hide resolved
@chukarsten chukarsten enabled auto-merge (squash) August 15, 2022 18:17
@chukarsten chukarsten merged commit 4ebe97a into main Aug 15, 2022
@chukarsten chukarsten deleted the 4658_ts-holdout branch August 15, 2022 18:21
@chukarsten chukarsten mentioned this pull request Aug 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants