-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update split_data with new default for timeseries #3650
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3650 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 335 335
Lines 33790 33840 +50
=======================================
+ Hits 33662 33712 +50
Misses 128 128
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just added a few suggestions, take them or leave them. Nothing blocking!
@@ -2,12 +2,13 @@ Release Notes | |||
------------- | |||
**Future Releases** | |||
* Enhancements | |||
* Add ``exclude_featurizers`` parameter to ``AutoMLSearch`` to specify featurizers that should be excluded from all pipelines :pr:`3631` | |||
* Added ``exclude_featurizers`` parameter to ``AutoMLSearch`` to specify featurizers that should be excluded from all pipelines :pr:`3631` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure you send a picture of this to @angela97lin
if is_binary(problem_type): | ||
X, y = X_y_binary | ||
if is_multiclass(problem_type): | ||
X, y = X_y_multi | ||
if is_regression(problem_type): | ||
X, y = X_y_regression | ||
problem_configuration = None | ||
if is_time_series(problem_type): | ||
problem_configuration = {"gap": 1, "max_delay": 7, "time_index": "ts_data"} | ||
|
||
X = make_data_type(data_type, X) | ||
y = make_data_type(data_type, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this is boiler plate not just for these two tests but also across many of the tests. Did you want to refactor it into something that returns problem_configuration
, X
, and y
given data_type
, problem_type
, etc? I am pretty sure I've seen this a few other places too. I think it's also fine to file an issue to call out the refactor with the modules we should refactor this code into.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this. A slight refactor to get_test_data_from_configuration
works wonderfully to factor this out! We can go through and replace other instances in a future story.
Reduces the default
test_size
insplit_data
to be 0.1 instead of 0.2, as long as the new value is greater than the passed in forecast horizon.