You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create periodic features such that there is no discontinuity in feature space where there shouldn't be one. E.g the 365 day of year should be adjacent to the 1st. I would imagine the API would work similar to the specification of categorical features with the additional component of the mimimum and maximum feature value that are equivalent. (E.g hour 0 and 24 of day are the same.)
Motivation
I primarily work with timeseries forecasting and common features we use are hour of day, day of week or day of year. If we take the day of year for example and I use the feature the the 365 day is not adjacent to the 1st day in feature space, but is it is in actuality. The model has to learn these days are likely to be similar rather than starting from that prior. I am often in the position where I have some data for Jan to say May, my prediction for December would have the day of year feature being built off May's data when in fact it should be more like Jan. This would also provide an additional constraint that should help the model fit better in the case of hour of day or day of week. Other periodic features could be angle.
Description
From a user perspective I imagine that we specify which features are periodic and what is the min and max feature values that are equivalent e.g 0 and 24 for hours. This could be done by passing a Dict[feature_name, Tuple[minval, maxval]] in the same way as the categorical features are defined.
Internally in the tree algorithm, in order to split the periodic feature, 2 leaf boundaries would have to be defined initially for a given feature, so the best pair of boudaries would be chosen. After which I imagine the algorithm working as it currently does.
In the hour of day example the optimum first split might be defined with hours 3 and 12, in which case one leaf is hour 3<h<12 and the other is 12<h<24 & 0<h<3.
If linear_tree=True then only one initial split would be required to fit a linear relationship.
Closed in favor of being in #2302. We decided to keep all feature requests in one place.
Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.
@candalfigomoro
I've seen that workaround, it's oknwhennyoubhavr complete days for the range. Buy in my case in often don't have say the second part of the year. And you can get the features creating like thr reverse pattern in the second half of the year.
Thanks though I might try it put again and see how it peforms.
Summary
Create periodic features such that there is no discontinuity in feature space where there shouldn't be one. E.g the 365 day of year should be adjacent to the 1st. I would imagine the API would work similar to the specification of categorical features with the additional component of the mimimum and maximum feature value that are equivalent. (E.g hour 0 and 24 of day are the same.)
Motivation
I primarily work with timeseries forecasting and common features we use are hour of day, day of week or day of year. If we take the day of year for example and I use the feature the the 365 day is not adjacent to the 1st day in feature space, but is it is in actuality. The model has to learn these days are likely to be similar rather than starting from that prior. I am often in the position where I have some data for Jan to say May, my prediction for December would have the day of year feature being built off May's data when in fact it should be more like Jan. This would also provide an additional constraint that should help the model fit better in the case of hour of day or day of week. Other periodic features could be angle.
Description
From a user perspective I imagine that we specify which features are periodic and what is the min and max feature values that are equivalent e.g 0 and 24 for hours. This could be done by passing a Dict[feature_name, Tuple[minval, maxval]] in the same way as the categorical features are defined.
Internally in the tree algorithm, in order to split the periodic feature, 2 leaf boundaries would have to be defined initially for a given feature, so the best pair of boudaries would be chosen. After which I imagine the algorithm working as it currently does.
In the hour of day example the optimum first split might be defined with hours 3 and 12, in which case one leaf is hour 3<h<12 and the other is 12<h<24 & 0<h<3.
If linear_tree=True then only one initial split would be required to fit a linear relationship.
References
Periodic constraints have been implemented in the pygam package.
https://pygam.readthedocs.io/en/latest/notebooks/tour_of_pygam.html
The text was updated successfully, but these errors were encountered: