Use `use_diff_of_y` and `predict_ahead == [0]` at the same time #33

rubenpeters91 · 2022-07-25T13:55:58Z

When using use_diff_of_y you apparently can't set predict_ahead = [0] in TimeseriesMLP, there are multiple checks for this in the code, and removing the first error will lead to predicting a straight line. Using use_diff_of_y with any other predict_ahead works as expected.

The text was updated successfully, but these errors were encountered:

philiproeleveld · 2023-02-21T15:43:25Z

I took some time to investigate this. The reason why you can't set predict_ahead = [0] with use_diff_of_y is because the predict_ahead (or lags in make_shifted_target where differencing is actually calculated) is used to determine the offset for the differencing. So for a predict_ahead of 3 the differencing is calculated using the third next timestamp. Therefore a predict_ahead of zero would result in the "differenced" y being all zero (subtracting the value from the 0th next timestamp; itself).

There is however still a valid reason to want to use differencing with a nowcast (when predict_ahead is zero), because the X features would then be aligned with the nowcast. For example if we have a timeseries with a frequency of 1 minute. Then predict_ahead = [5] would result in a prediction:

For the difference between timestamps 00:05 and 00:00
Indexed as 00:00
Using features from X at index 00:00

Whereas predict_ahead = [0] with a manually configured differencing offset of 5 would result in a prediction:

For the difference between timestamps 00:05 and 00:00
Indexed as 00:05
Using features from X at index 00:05

The last point is the important one: The predict_ahead = [0] case would use data from the last of the two timstamps that are differenced, whereas the predict_ahead = [5] case uses data from the first of the two timestamps.

To support such a "manually configured differencing offest" for the nowcast, since we can't just use the predict_ahead/lags value of zero, you would change the calculation in make_shifted_target¹ from the current implementation:

result = pd.concat([-1 * y.diff(-1 * lag) for lag in lags], axis=1)

To something like:

result = pd.concat([y.diff(nowcast_offset) if lag == 0 else -1 * y.diff(-1 * lag) for lag in lags], axis=1)

Which would at the very least introduce a nowcast_offset parameter in make_shifted_target and probably also in the __init__ of BaseTimeseriesRegressor.

So then, it might be worth it to implement this for some relatively niche application where the desired differencing offset is high and it has significant impact which timestamp is used from X, but I'm not convinced such a case will ever come up. So I think the current behavior of rejecting the combination of use_diff_of_y with predict_ahead = [0] is totally acceptable.

It is also necessary to shift y by the same offset in inverse_differenced_target when adding the differences back. ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `use_diff_of_y` and `predict_ahead == [0]` at the same time #33

Use `use_diff_of_y` and `predict_ahead == [0]` at the same time #33

rubenpeters91 commented Jul 25, 2022

philiproeleveld commented Feb 21, 2023

Use use_diff_of_y and predict_ahead == [0] at the same time #33

Use use_diff_of_y and predict_ahead == [0] at the same time #33

Comments

rubenpeters91 commented Jul 25, 2022

philiproeleveld commented Feb 21, 2023

Footnotes

Use `use_diff_of_y` and `predict_ahead == [0]` at the same time #33

Use `use_diff_of_y` and `predict_ahead == [0]` at the same time #33