Skip to content

Fix TSDataset.make_future to handle hierarchy, quantiles, target components #1248

Merged
merged 5 commits into from
Apr 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Fixed
-
- Fix `BaseReconciliator` to work on `pandas==1.1.5` ([#1229](https://github.com/tinkoff-ai/etna/pull/1229))
-
- Fix `TSDataset.make_future` to handle hierarchy, quantiles, target components ([#1248](https://github.com/tinkoff-ai/etna/pull/1248))
- Fix warning during creation of `ResampleWithDistributionTransform` ([#1230](https://github.com/tinkoff-ai/etna/pull/1230))
- Add deep copy for copying attributes of `TSDataset` ([#1241](https://github.com/tinkoff-ai/etna/pull/1241))
-
Expand Down
20 changes: 15 additions & 5 deletions etna/datasets/tsdataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,21 +230,23 @@ def __getitem__(self, item):
def make_future(
self, future_steps: int, transforms: Sequence["Transform"] = (), tail_steps: int = 0
) -> "TSDataset":
"""Return new TSDataset with future steps.
"""Return new TSDataset with features extended into the future.

The result dataset doesn't contain quantiles and target components.

Parameters
----------
future_steps:
number of timestamp in the future to build features for.
number of steps to extend dataset into the future.
transforms:
sequence of transforms to be applied.
tail_steps:
number of timestamp for context to build features for.
number of steps to keep from the tail of the original dataset.

Returns
-------
:
dataset with features in the future.
dataset with features extended into the.

Examples
--------
Expand Down Expand Up @@ -295,6 +297,14 @@ def make_future(
f"NaN-s will be used for missing values"
)

# remove components and quantiles
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of them are not added to raw_df, not sure that we really need to drop the explicitly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our fixtures we create some datasets, giving components and quantiles in raw_df. For example, you can look at fixtures that was used in test_make_future_removes_quantiles and test_make_future_removes_target_components. We can try to fix this fixtures, I think.

For components we can in some sense guarantee that there will be no components (because they are added using special method). But for quantiles we can't make such guarantee.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a good solution should consist of two steps:

  • Rework quantiles to be like components
  • Fix our fixtures to follow the procedure of adding quantiles and components

We can probably create some task for this.

Offtop, but I also think that we should somehow make a more formal restriction on columns that can be passed into df of TSDataset.__init__. Because we suggest that it is only endog, but we don't check this and in a lot of inner code don't follow this rule, so it can lead to confusing situations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, the right solution is to rework quantiles. What kind of restrictions do you mean? Like allow only target?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we going to do with this issue? Leave dropping this way and fix it after fixing quantiles?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should discuss do we want this restriction or not, but the proposal is to allow in df to be only columns timestamp, segment, target. We can also generate a good error message if smth went wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we going to do with this issue? Leave dropping this way and fix it after fixing quantiles?

Probably, we can do like this, yes. This piece of code won't break anything (as I understand), but can safe us for now from some strange moments. We can add todo about fixing it after reworking quantiles. But we have a problem that we aren't really fixing todos. We can create a task on fixing quantiles and point out in that task to remove these redundant lines.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Let's leave TODO
  2. We can restrict df to have only timestamp, segment, target -- I guess it should prevent people adding exogs incorrectly. However it seems that in some places we create dataset out of df with extra columns ourselvs

# it should be done if we have quantiles and components in raw_df
# TODO: fix this after making quantiles to work like components, with special methods
if len(self.target_components_names) > 0:
df = df.drop(columns=list(self.target_components_names), level="feature")
if len(self.target_quantiles_names) > 0:
df = df.drop(columns=list(self.target_quantiles_names), level="feature")

# Here only df is required, other metadata is not necessary to build the dataset
ts = TSDataset(df=df, freq=self.freq)
for transform in transforms:
Expand All @@ -305,7 +315,7 @@ def make_future(
future_dataset = df.tail(future_steps + tail_steps).copy(deep=True)

future_dataset = future_dataset.sort_index(axis=1, level=(0, 1))
future_ts = TSDataset(df=future_dataset, freq=self.freq)
future_ts = TSDataset(df=future_dataset, freq=self.freq, hierarchical_structure=self.hierarchical_structure)

# can't put known_future into constructor, _check_known_future fails with df_exog=None
future_ts.known_future = deepcopy(self.known_future)
Expand Down
18 changes: 18 additions & 0 deletions tests/test_datasets/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -620,6 +620,24 @@ def test_make_future_inherits_regressors(df_and_regressors):
assert ts_future.regressors == ts.regressors


def test_make_future_inherits_hierarchy(product_level_constant_forecast_with_quantiles):
ts = product_level_constant_forecast_with_quantiles
future = ts.make_future(future_steps=2)
assert future.hierarchical_structure is ts.hierarchical_structure


def test_make_future_removes_quantiles(product_level_constant_forecast_with_quantiles):
ts = product_level_constant_forecast_with_quantiles
future = ts.make_future(future_steps=2)
assert len(future.target_quantiles_names) == 0


def test_make_future_removes_target_components(ts_with_target_components):
ts = ts_with_target_components
future = ts.make_future(future_steps=2)
assert len(future.target_components_names) == 0


def test_make_future_warn_not_enough_regressors(df_and_regressors):
"""Check that warning is thrown if regressors don't have enough values for the future."""
df, df_exog, known_future = df_and_regressors
Expand Down