Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed target name/series ID divider and added ability to return series ID column with predictions #4357

Merged
merged 6 commits into from
Nov 2, 2023

Conversation

christopherbunn
Copy link
Contributor

@christopherbunn christopherbunn commented Oct 25, 2023

Resolves #4359

@christopherbunn christopherbunn force-pushed the add_seriesid_pred_in_sample branch from 08713ad to 2845145 Compare October 25, 2023 20:50
@codecov
Copy link

codecov bot commented Oct 25, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (8ffa04f) 99.7% compared to head (36185ae) 99.7%.

Additional details and impacted files
@@           Coverage Diff           @@
##            main   #4357     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        357     357             
  Lines      39869   39910     +41     
=======================================
+ Hits       39749   39790     +41     
  Misses       120     120             
Files Coverage Δ
...valml/pipelines/multiseries_regression_pipeline.py 100.0% <100.0%> (ø)
...valml/pipelines/time_series_regression_pipeline.py 100.0% <100.0%> (ø)
evalml/pipelines/utils.py 99.7% <100.0%> (+0.1%) ⬆️
...sts/component_tests/test_time_series_featurizer.py 99.7% <100.0%> (+0.1%) ⬆️
.../tests/component_tests/test_time_series_imputer.py 100.0% <100.0%> (ø)
evalml/tests/conftest.py 98.4% <100.0%> (+0.1%) ⬆️
...line_tests/test_multiseries_regression_pipeline.py 100.0% <100.0%> (ø)
evalml/tests/pipeline_tests/test_pipeline_utils.py 99.6% <ø> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@christopherbunn christopherbunn force-pushed the add_seriesid_pred_in_sample branch from 6de173a to a5049b0 Compare October 31, 2023 04:05
Comment on lines 132 to 147
y_unstacked = y_unstacked[
y_train_unstacked.columns.intersection(y_unstacked.columns)
]
X_unstacked = X_unstacked[
X_train_unstacked.columns.intersection(X_unstacked.columns)
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you not just want the X_train/y_train columns here?

@@ -133,7 +149,14 @@ def predict_in_sample(
objective,
calculating_residuals,
)
stacked_predictions = stack_data(unstacked_predictions)
if include_series_id:
stacked_predictions = stack_data(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're missing testing on this branch?

evalml/pipelines/time_series_regression_pipeline.py Outdated Show resolved Hide resolved
evalml/pipelines/utils.py Outdated Show resolved Hide resolved
evalml/pipelines/utils.py Outdated Show resolved Hide resolved
@christopherbunn christopherbunn force-pushed the add_seriesid_pred_in_sample branch from dd0346d to 4a1afa2 Compare October 31, 2023 21:43
@christopherbunn christopherbunn force-pushed the add_seriesid_pred_in_sample branch from 4a1afa2 to 08694b0 Compare October 31, 2023 21:43
Copy link
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, and I think we're missing a test!

Comment on lines 131 to 149
# Order series columns to be same as expected input feature names
input_features = list(self.input_feature_names.values())[0]
X_unstacked = X_unstacked[
[feature for feature in input_features if feature in X_unstacked.columns]
]
X_train_unstacked = X_train_unstacked[
[
feature
for feature in input_features
if feature in X_train_unstacked.columns
]
]
y_overlapping_features = [
feature
for feature in y_train_unstacked.columns
if feature in y_unstacked.columns
]
y_unstacked = y_unstacked[y_overlapping_features]
y_train_unstacked = y_train_unstacked[y_overlapping_features]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really long chunk of text, where a lot of it's repeated. A few questions:

  • Is there a test case that covers this? (i.e. one that fails without this code)
  • Are X_unstacked and X_train_unstacked ever going to have different columns? It seems odd that we get those separately from each other, so differently from how y is handled here
  • Is the goal here to filter columns, reorder columns, or both? The comment makes me think it's re-ordering, but the code makes me think we're filtering

Copy link
Contributor Author

@christopherbunn christopherbunn Nov 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Will push an additional test case. Actually our current test case for predict_in_sample() errors out if it isn't in the right order. I could add something explicitly if you think it would be helpful?
  • This covers the case when we're forecasting. When we're forecasting, we only pass in the dates + the series IDs. If we're using lagged features (like in the future), we can pull them from X_train even if they're not specified in the current X. We can generally expect the y and y_train values to be consistent since the column names come from the same series ID values.
  • The goal is to do both for the reason described above. I can update the comment to clarify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say the current test case errors out if it isn't in the right order, does that mean you changed it around manually to verify it fails in that case? I'm thinking we'd benefit from an explicit test case that fails if this code isn't in place, no modification required. It'll help stop us from removing or breaking this bit in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test_multiseries_pipeline_predict_in_sample_series_out_of_order() which evaluates this case.

evalml/pipelines/multiseries_regression_pipeline.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after recent revisions

@christopherbunn christopherbunn enabled auto-merge (squash) November 2, 2023 21:08
@christopherbunn christopherbunn merged commit 735ca67 into main Nov 2, 2023
23 checks passed
@christopherbunn christopherbunn deleted the add_seriesid_pred_in_sample branch November 2, 2023 21:18
@MichaelFu512 MichaelFu512 mentioned this pull request Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Further refine multiseries stacking utilities
5 participants