Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

step_fourier returns all NaN #40

Closed
mpelath opened this issue Jun 24, 2020 · 5 comments
Closed

step_fourier returns all NaN #40

mpelath opened this issue Jun 24, 2020 · 5 comments

Comments

@mpelath
Copy link

mpelath commented Jun 24, 2020

Although it worked a month or two ago, step_fourier is now giving me NaNs for everything.

After pulling the source code and debugging, I think the issue arises when the scale is inferred:

date_to_seq_scale_factor <- function(idx) {
tk_get_timeseries_summary(idx) %>% dplyr::pull(diff.median)
}

since tk_get_timeseries_summary returns a diff.median of zero. This is because I'm using panel data, not time series data. My guess is that the sort order of the data is now being changed by some upstream process (possibly but not necessarily something in timetk). When the data is sorted by the time index, rather than the unit then the time index, then it looks like most diffs are zero. The scale factor is then zero.

I don't know whether there is anything you can do about it. Perhaps allow the user to define the scale, or just document this pitfall when using non-univariate time series data.

@mdancho84
Copy link
Contributor

@mpelath Do you have a reproducible example?

@mpelath
Copy link
Author

mpelath commented Jun 26, 2020

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(recipes)
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stringr':
#> 
#>     fixed
#> The following object is masked from 'package:stats':
#> 
#>     step
library(timetk)

dates <- c(ymd(20200531), ymd(20200630))
train_data <- tibble(id = c(1, 1, 2, 2, 3, 3), date = rep(dates, 3))

good_recipe <- recipe(train_data) %>% step_fourier(date, period = 12, K = 6)
good_recipe_prepped <- prep(good_recipe, train_data)
baked <- bake(good_recipe_prepped, train_data)

train_data <- train_data %>% arrange(date)
bad_recipe <- recipe(train_data) %>% step_fourier(date, period = 12, K = 6)
bad_recipe_prepped <- prep(bad_recipe, train_data)
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
baked <- bake(bad_recipe_prepped, train_data)
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced

Created on 2020-06-26 by the reprex package (v0.3.0)

@mdancho84
Copy link
Contributor

I took a look at this. This one is complicated...

About scaling

In order for the fourier terms to come out correctly, there needs to be scaling applied to ensure that the sine and cosine are generated with unit difference between subsequent terms.

So in this case, it's actually bad to rearrange by date because you actually have groups of dates. The time difference between observations becomes zero when it should be the difference between the first and second date in each group of date sequences.

Solution

I've added an error that now happens. Hopefully, this will point users in the right direction.

> bad_recipe_prepped <- prep(bad_recipe, train_data)
 Error: Problem with `mutate()` input `date_sin12_K1`.
x Time difference between observations is zero. Try arranging data to have a positive time difference between observations. If working with time series groups, arrange by groups first, then date.
ℹ Input `date_sin12_K1` is `timetk::fourier_vec(x = date, period = 12, K = 1L, type = "sin")`.
Run `rlang::last_error()` to see where the error occurred. 

@mdancho84
Copy link
Contributor

Closing this issue. The fix will be included in version 2.1.0.

@jam1245
Copy link

jam1245 commented May 6, 2021

Thanks for adding the error code. This came up as an issue for me in a grouped time series.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants