-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
step_fourier returns all NaN #40
Comments
@mpelath Do you have a reproducible example? |
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
library(recipes)
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stringr':
#>
#> fixed
#> The following object is masked from 'package:stats':
#>
#> step
library(timetk)
dates <- c(ymd(20200531), ymd(20200630))
train_data <- tibble(id = c(1, 1, 2, 2, 3, 3), date = rep(dates, 3))
good_recipe <- recipe(train_data) %>% step_fourier(date, period = 12, K = 6)
good_recipe_prepped <- prep(good_recipe, train_data)
baked <- bake(good_recipe_prepped, train_data)
train_data <- train_data %>% arrange(date)
bad_recipe <- recipe(train_data) %>% step_fourier(date, period = 12, K = 6)
bad_recipe_prepped <- prep(bad_recipe, train_data)
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
baked <- bake(bad_recipe_prepped, train_data)
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced Created on 2020-06-26 by the reprex package (v0.3.0) |
I took a look at this. This one is complicated... About scalingIn order for the fourier terms to come out correctly, there needs to be scaling applied to ensure that the sine and cosine are generated with unit difference between subsequent terms. So in this case, it's actually bad to rearrange by date because you actually have groups of dates. The time difference between observations becomes zero when it should be the difference between the first and second date in each group of date sequences. SolutionI've added an error that now happens. Hopefully, this will point users in the right direction.
|
Closing this issue. The fix will be included in version 2.1.0. |
Thanks for adding the error code. This came up as an issue for me in a grouped time series. |
Although it worked a month or two ago, step_fourier is now giving me NaNs for everything.
After pulling the source code and debugging, I think the issue arises when the scale is inferred:
date_to_seq_scale_factor <- function(idx) {
tk_get_timeseries_summary(idx) %>% dplyr::pull(diff.median)
}
since tk_get_timeseries_summary returns a diff.median of zero. This is because I'm using panel data, not time series data. My guess is that the sort order of the data is now being changed by some upstream process (possibly but not necessarily something in timetk). When the data is sorted by the time index, rather than the unit then the time index, then it looks like most diffs are zero. The scale factor is then zero.
I don't know whether there is anything you can do about it. Perhaps allow the user to define the scale, or just document this pitfall when using non-univariate time series data.
The text was updated successfully, but these errors were encountered: