Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use use_diff_of_y and predict_ahead == [0] at the same time #33

Open
rubenpeters91 opened this issue Jul 25, 2022 · 1 comment
Open
Labels
Priority: Low Issues that have no or little impact on current version. No fix or workaround required Type: Feature New feature requests

Comments

@rubenpeters91
Copy link
Contributor

When using use_diff_of_y you apparently can't set predict_ahead = [0] in TimeseriesMLP, there are multiple checks for this in the code, and removing the first error will lead to predicting a straight line. Using use_diff_of_y with any other predict_ahead works as expected.

@rubenpeters91 rubenpeters91 added Priority: Medium Issues that need to be fixed, but low impact or a workaround exists Type: Feature New feature requests Priority: Low Issues that have no or little impact on current version. No fix or workaround required and removed Priority: Medium Issues that need to be fixed, but low impact or a workaround exists labels Jul 25, 2022
@philiproeleveld
Copy link
Contributor

I took some time to investigate this. The reason why you can't set predict_ahead = [0] with use_diff_of_y is because the predict_ahead (or lags in make_shifted_target where differencing is actually calculated) is used to determine the offset for the differencing. So for a predict_ahead of 3 the differencing is calculated using the third next timestamp. Therefore a predict_ahead of zero would result in the "differenced" y being all zero (subtracting the value from the 0th next timestamp; itself).

There is however still a valid reason to want to use differencing with a nowcast (when predict_ahead is zero), because the X features would then be aligned with the nowcast. For example if we have a timeseries with a frequency of 1 minute. Then predict_ahead = [5] would result in a prediction:

  • For the difference between timestamps 00:05 and 00:00
  • Indexed as 00:00
  • Using features from X at index 00:00

Whereas predict_ahead = [0] with a manually configured differencing offset of 5 would result in a prediction:

  • For the difference between timestamps 00:05 and 00:00
  • Indexed as 00:05
  • Using features from X at index 00:05

The last point is the important one: The predict_ahead = [0] case would use data from the last of the two timstamps that are differenced, whereas the predict_ahead = [5] case uses data from the first of the two timestamps.

To support such a "manually configured differencing offest" for the nowcast, since we can't just use the predict_ahead/lags value of zero, you would change the calculation in make_shifted_target1 from the current implementation:

result = pd.concat([-1 * y.diff(-1 * lag) for lag in lags], axis=1)

To something like:

result = pd.concat([y.diff(nowcast_offset) if lag == 0 else -1 * y.diff(-1 * lag) for lag in lags], axis=1)

Which would at the very least introduce a nowcast_offset parameter in make_shifted_target and probably also in the __init__ of BaseTimeseriesRegressor.

So then, it might be worth it to implement this for some relatively niche application where the desired differencing offset is high and it has significant impact which timestamp is used from X, but I'm not convinced such a case will ever come up. So I think the current behavior of rejecting the combination of use_diff_of_y with predict_ahead = [0] is totally acceptable.

Footnotes

  1. It is also necessary to shift y by the same offset in inverse_differenced_target when adding the differences back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Low Issues that have no or little impact on current version. No fix or workaround required Type: Feature New feature requests
Projects
None yet
Development

No branches or pull requests

2 participants