Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recursive() error - glmnet #76

Closed
mdancho84 opened this issue Mar 23, 2021 · 5 comments
Closed

recursive() error - glmnet #76

mdancho84 opened this issue Mar 23, 2021 · 5 comments
Labels
bug Something isn't working

Comments

@mdancho84
Copy link
Contributor

mdancho84 commented Mar 23, 2021

This issue happens with glmnet models only.

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 'as.matrix': invalid class 'NA' to dup_mMatrix_as_dgeMatrix

Reproducible Example

library(modeltime)
library(tidymodels)
library(tidyverse)
library(lubridate)
library(timetk)

FORECAST_HORIZON <- 24

m4_extended <- m4_monthly %>%
    group_by(id) %>%
    future_frame(
        .length_out = FORECAST_HORIZON,
        .bind_data  = TRUE
    ) %>%
    ungroup()

lag_roll_transformer_grouped <- function(data){
    data %>%
        group_by(id) %>%
        tk_augment_lags(value, .lags = 1:12) %>%
        tk_augment_slidify(
            .value   = contains("lag1"),
            .f       = ~mean(.x, na.rm = T),
            .period  = c(12, 24, 36),
            .partial = TRUE
        ) %>%
        select(-value_lag1) %>%
        ungroup()
}

m4_lags <- m4_extended %>%
    lag_roll_transformer_grouped()

train_data <- m4_lags %>%
    drop_na()

future_data <- m4_lags %>%
    filter(is.na(value))

splits <- train_data %>%
    time_series_split(date, assess = FORECAST_HORIZON, cumulative = TRUE)

recipe_spec <- recipe(value ~ ., data = training(splits)) %>%
    step_timeseries_signature(date) %>%
    step_rm(matches("(.xts$)|(.iso$)|(hour)|(minute)|(second)|(am.pm)")) %>%
    step_rm(date) %>%
    step_zv(all_predictors()) %>%
    step_normalize(date_index.num) %>%
    step_dummy(all_nominal(), one_hot = TRUE)

model_fit_glmet_recursive <- workflow() %>%
    add_model(linear_reg(penalty = 0.1) %>% set_engine("glmnet")) %>%
    add_recipe(recipe_spec) %>%
    fit(training(splits)) %>%
    recursive(
        id         = "id", 
        transform  = lag_roll_transformer_grouped,
        train_tail = panel_tail(training(splits), id, FORECAST_HORIZON)
    )

modeltime_table(
    model_fit_glmet_recursive
) %>% modeltime_forecast(testing(splits))
#> Error: error in evaluating the argument 'x' in selecting a method for function 'as.matrix': invalid class 'NA' to dup_mMatrix_as_dgeMatrix
#> Warning: Unknown or uninitialised column: `.key`.
#> Error: Problem with `filter()` input `..1`.
#> x object '.key' not found
#> ℹ Input `..1` is `.model_desc == "ACTUAL" | .key == "prediction"`.

model_fit_glmet_recursive %>%
    mdl_time_forecast(new_data = testing(splits))
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'as.matrix': invalid class 'NA' to dup_mMatrix_as_dgeMatrix
```

<sup>Created on 2021-03-23 by the [reprex package](https://reprex.tidyverse.org) (v1.0.0)</sup>
@AlbertoAlmuinha
Copy link
Contributor

Hi @mdancho84 ,

I can't reproduce the error just with the version before yesterday's patch because of the XGboost issue #75 . But when I upgrade just to the latest version, it does produce the error you mention, so it must be something we modified to fix that problem. In principle it should be in modeltime_forecast() function. I have not been able to check it, but my intuition tells me that the problem is here:

    # Fix - When ID is dummied
    df <- new_data
    if (!is.null(id)) {
        if (!id %in% names(new_data)) {
            df <- new_data %>%
                dplyr::bind_cols(df_id)
            fit$spec$remove_id <- TRUE
        }
    }

    # PREDICT
    data_formatted <- fit %>%
        stats::predict(
            new_data = df
        ) %>%
        dplyr::bind_cols(time_stamp_predictors_tbl) %>%
        dplyr::mutate(.key = "prediction") %>%
        dplyr::select(.key, dplyr::everything())
library(modeltime)
library(tidymodels)
#> Warning: package 'dplyr' was built under R version 4.0.4
#> Warning: package 'parsnip' was built under R version 4.0.4
#> Warning: package 'rsample' was built under R version 4.0.4
#> Warning: package 'tibble' was built under R version 4.0.4
#> Warning: package 'tidyr' was built under R version 4.0.4
#> Warning: package 'tune' was built under R version 4.0.4
#> Warning: package 'workflows' was built under R version 4.0.4
library(tidyverse)
#> Warning: package 'forcats' was built under R version 4.0.4
library(lubridate)
#> Warning: package 'lubridate' was built under R version 4.0.4
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(timetk)

FORECAST_HORIZON <- 24

m4_extended <- m4_monthly %>%
    group_by(id) %>%
    future_frame(
        .length_out = FORECAST_HORIZON,
        .bind_data  = TRUE
    ) %>%
    ungroup()
#> .date_var is missing. Using: date

lag_roll_transformer_grouped <- function(data){
    data %>%
        group_by(id) %>%
        tk_augment_lags(value, .lags = 1:12) %>%
        tk_augment_slidify(
            .value   = contains("lag1"),
            .f       = ~mean(.x, na.rm = T),
            .period  = c(12, 24, 36),
            .partial = TRUE
        ) %>%
        select(-value_lag1) %>%
        ungroup()
}

m4_lags <- m4_extended %>%
    lag_roll_transformer_grouped()

train_data <- m4_lags %>%
    drop_na()

future_data <- m4_lags %>%
    filter(is.na(value))

splits <- train_data %>%
    time_series_split(date, assess = FORECAST_HORIZON, cumulative = TRUE)
#> Data is not ordered by the 'date_var'. Resamples will be arranged by `date`.
#> Overlapping Timestamps Detected. Processing overlapping time series together using sliding windows.

recipe_spec <- recipe(value ~ ., data = training(splits)) %>%
    step_timeseries_signature(date) %>%
    step_rm(matches("(.xts$)|(.iso$)|(hour)|(minute)|(second)|(am.pm)")) %>%
    step_rm(date) %>%
    step_zv(all_predictors()) %>%
    step_normalize(date_index.num) %>%
    step_dummy(all_nominal(), one_hot = TRUE)

model_fit_glmet_recursive <- workflow() %>%
    add_model(linear_reg(penalty = 0.1) %>% set_engine("glmnet")) %>%
    add_recipe(recipe_spec) %>%
    fit(training(splits)) %>%
    recursive(
        id         = "id", 
        transform  = lag_roll_transformer_grouped,
        train_tail = panel_tail(training(splits), id, FORECAST_HORIZON)
    )

modeltime_table(
    model_fit_glmet_recursive
) %>% modeltime_forecast(testing(splits))
#> # A tibble: 96 x 5
#>    .model_id .model_desc .key       .index     .value
#>        <int> <chr>       <fct>      <date>      <dbl>
#>  1         1 GLMNET      prediction 2013-07-01 10387.
#>  2         1 GLMNET      prediction 2013-07-01  2948.
#>  3         1 GLMNET      prediction 2013-07-01  9639.
#>  4         1 GLMNET      prediction 2013-07-01  1345.
#>  5         1 GLMNET      prediction 2013-08-01  9669.
#>  6         1 GLMNET      prediction 2013-08-01  2701.
#>  7         1 GLMNET      prediction 2013-08-01  9711.
#>  8         1 GLMNET      prediction 2013-08-01  1225.
#>  9         1 GLMNET      prediction 2013-09-01  8263.
#> 10         1 GLMNET      prediction 2013-09-01  2694.
#> # ... with 86 more rows


model_fit_glmet_recursive %>%
    mdl_time_forecast(new_data = testing(splits))
#> # A tibble: 96 x 3
#>    .key       .index     .value
#>    <fct>      <date>      <dbl>
#>  1 prediction 2013-07-01 10387.
#>  2 prediction 2013-07-01  2948.
#>  3 prediction 2013-07-01  9639.
#>  4 prediction 2013-07-01  1345.
#>  5 prediction 2013-08-01  9669.
#>  6 prediction 2013-08-01  2701.
#>  7 prediction 2013-08-01  9711.
#>  8 prediction 2013-08-01  1225.
#>  9 prediction 2013-09-01  8263.
#> 10 prediction 2013-09-01  2694.
#> # ... with 86 more rows

sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19041)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252   
#> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                  
#> [5] LC_TIME=Spanish_Spain.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] timetk_2.6.1         lubridate_1.7.10     forcats_0.5.1       
#>  [4] stringr_1.4.0        readr_1.4.0          tidyverse_1.3.0     
#>  [7] yardstick_0.0.7      workflows_0.2.2      tune_0.1.3          
#> [10] tidyr_1.1.3          tibble_3.1.0         rsample_0.0.9       
#> [13] recipes_0.1.15       purrr_0.3.4          parsnip_0.1.5       
#> [16] modeldata_0.1.0      infer_0.5.3          ggplot2_3.3.3       
#> [19] dplyr_1.0.5          dials_0.0.9          scales_1.1.1        
#> [22] broom_0.7.2          tidymodels_0.1.2     modeltime_0.4.2.9000
#> 
#> loaded via a namespace (and not attached):
#>  [1] fs_1.5.0           xts_0.12.1         DiceDesign_1.9     httr_1.4.2        
#>  [5] tools_4.0.3        backports_1.2.1    utf8_1.1.4         R6_2.5.0          
#>  [9] rpart_4.1-15       DBI_1.1.0          colorspace_2.0-0   nnet_7.3-14       
#> [13] withr_2.4.1        tidyselect_1.1.0   compiler_4.0.3     glmnet_4.0-2      
#> [17] rvest_0.3.6        cli_2.3.1          xml2_1.3.2         digest_0.6.27     
#> [21] rmarkdown_2.7      pkgconfig_2.0.3    htmltools_0.5.1.1  parallelly_1.23.0 
#> [25] lhs_1.1.1          dbplyr_2.0.0       highr_0.8          readxl_1.3.1      
#> [29] rlang_0.4.10       rstudioapi_0.13    shape_1.4.5        generics_0.1.0    
#> [33] zoo_1.8-9          jsonlite_1.7.2     magrittr_2.0.1     Matrix_1.2-18     
#> [37] Rcpp_1.0.6         munsell_0.5.0      fansi_0.4.2        GPfit_1.0-8       
#> [41] lifecycle_1.0.0    furrr_0.2.2        stringi_1.5.3      pROC_1.17.0.1     
#> [45] yaml_2.2.1         MASS_7.3-53        plyr_1.8.6         grid_4.0.3        
#> [49] parallel_4.0.3     listenv_0.8.0      slider_0.1.5       crayon_1.4.1      
#> [53] lattice_0.20-41    haven_2.3.1        splines_4.0.3      hms_1.0.0         
#> [57] knitr_1.30         ps_1.6.0           pillar_1.5.1       codetools_0.2-16  
#> [61] reprex_1.0.0       glue_1.4.2         evaluate_0.14      modelr_0.1.8      
#> [65] vctrs_0.3.6        foreach_1.5.1      cellranger_1.1.0   gtable_0.3.0      
#> [69] future_1.21.0      assertthat_0.2.1   xfun_0.21          gower_0.2.2       
#> [73] prodlim_2019.11.13 class_7.3-17       survival_3.2-7     timeDate_3043.102 
#> [77] iterators_1.0.13   hardhat_0.1.5      warp_0.2.0         lava_1.6.9        
#> [81] globals_0.14.0     ellipsis_0.3.1     ipred_0.9-10

Once updated:

library(modeltime)
library(tidymodels)
#> Warning: package 'dplyr' was built under R version 4.0.4
#> Warning: package 'parsnip' was built under R version 4.0.4
#> Warning: package 'rsample' was built under R version 4.0.4
#> Warning: package 'tibble' was built under R version 4.0.4
#> Warning: package 'tidyr' was built under R version 4.0.4
#> Warning: package 'tune' was built under R version 4.0.4
#> Warning: package 'workflows' was built under R version 4.0.4
library(tidyverse)
#> Warning: package 'forcats' was built under R version 4.0.4
library(lubridate)
#> Warning: package 'lubridate' was built under R version 4.0.4
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(timetk)

FORECAST_HORIZON <- 24

m4_extended <- m4_monthly %>%
    group_by(id) %>%
    future_frame(
        .length_out = FORECAST_HORIZON,
        .bind_data  = TRUE
    ) %>%
    ungroup()
#> .date_var is missing. Using: date

lag_roll_transformer_grouped <- function(data){
    data %>%
        group_by(id) %>%
        tk_augment_lags(value, .lags = 1:12) %>%
        tk_augment_slidify(
            .value   = contains("lag1"),
            .f       = ~mean(.x, na.rm = T),
            .period  = c(12, 24, 36),
            .partial = TRUE
        ) %>%
        select(-value_lag1) %>%
        ungroup()
}

m4_lags <- m4_extended %>%
    lag_roll_transformer_grouped()

train_data <- m4_lags %>%
    drop_na()

future_data <- m4_lags %>%
    filter(is.na(value))

splits <- train_data %>%
    time_series_split(date, assess = FORECAST_HORIZON, cumulative = TRUE)
#> Data is not ordered by the 'date_var'. Resamples will be arranged by `date`.
#> Overlapping Timestamps Detected. Processing overlapping time series together using sliding windows.

recipe_spec <- recipe(value ~ ., data = training(splits)) %>%
    step_timeseries_signature(date) %>%
    step_rm(matches("(.xts$)|(.iso$)|(hour)|(minute)|(second)|(am.pm)")) %>%
    step_rm(date) %>%
    step_zv(all_predictors()) %>%
    step_normalize(date_index.num) %>%
    step_dummy(all_nominal(), one_hot = TRUE)

model_fit_glmet_recursive <- workflow() %>%
    add_model(linear_reg(penalty = 0.1) %>% set_engine("glmnet")) %>%
    add_recipe(recipe_spec) %>%
    fit(training(splits)) %>%
    recursive(
        id         = "id", 
        transform  = lag_roll_transformer_grouped,
        train_tail = panel_tail(training(splits), id, FORECAST_HORIZON)
    )

modeltime_table(
    model_fit_glmet_recursive
) %>% modeltime_forecast(testing(splits))
#> Error: error in evaluating the argument 'x' in selecting a method for function 'as.matrix': invalid class 'NA' to dup_mMatrix_as_dgeMatrix
#> Warning: Unknown or uninitialised column: `.key`.
#> Error: Problem with `filter()` input `..1`.
#> x objeto '.key' no encontrado
#> i Input `..1` is `.model_desc == "ACTUAL" | .key == "prediction"`.

Regards,

@mdancho84
Copy link
Contributor Author

That's interesting. I'll have to go through the code I added. It's very odd since I'm only adding an ID when it's necessary.

I did remove some columns to make them consistent with .first_slice and .nth_slice. Not sure if that was the error.

Will need to do a thorough review. Should be able to tackle later this week.

@mdancho84
Copy link
Contributor Author

I'm pretty sure this is what happened...
image

@mdancho84 mdancho84 added the bug Something isn't working label Mar 25, 2021
@mdancho84
Copy link
Contributor Author

Ok, I'm getting somewhere. parsnip has a few classes that don't get the normal predict.model_fit(), and we need to watch out for these. predict._elnet is one where we need to manually send to predict.recursive or predict.recursive_panel.

Here's what parsnip currently has special prediction methods for that override the predict.model_fit. The first one (predict._elnet) applies to glmnet regression models:

image

We can see it's getting invoked in the traceback.

image

Which means it never hits our predict.recursive or predict.recursive_panel functions.

@mdancho84
Copy link
Contributor Author

A working example.

library(modeltime)
library(tidymodels)
library(tidyverse)
library(lubridate)
library(timetk)

FORECAST_HORIZON <- 24

m4_extended <- m4_monthly %>%
    group_by(id) %>%
    future_frame(
        .length_out = FORECAST_HORIZON,
        .bind_data  = TRUE
    ) %>%
    ungroup()
#> .date_var is missing. Using: date

lag_roll_transformer_grouped <- function(data){
    data %>%
        group_by(id) %>%
        tk_augment_lags(value, .lags = 1:24) %>%
        tk_augment_slidify(
            .value   = contains("lag1"),
            .f       = ~mean(.x, na.rm = T),
            .period  = c(12, 24, 36),
            .partial = TRUE
        ) %>%
        select(-value_lag1) %>%
        ungroup()
}

m4_lags <- m4_extended %>%
    lag_roll_transformer_grouped()

train_data <- m4_lags %>%
    drop_na()

future_data <- m4_lags %>%
    filter(is.na(value))

splits <- train_data %>%
    time_series_split(date, assess = FORECAST_HORIZON, cumulative = TRUE)

recipe_spec <- recipe(value ~ ., data = training(splits)) %>%
    step_timeseries_signature(date) %>%
    step_rm(matches("(.xts$)|(.iso$)|(hour)|(minute)|(second)|(am.pm)")) %>%
    step_rm(date) %>%
    step_zv(all_predictors()) %>%
    step_normalize(date_index.num, date_year) %>%
    step_dummy(all_nominal(), one_hot = TRUE)

model_fit_lm_recursive <- workflow() %>%
    add_model(linear_reg() %>% set_engine("lm")) %>%
    add_recipe(recipe_spec) %>%
    fit(training(splits)) %>%
    recursive(
        id         = "id", 
        transform  = lag_roll_transformer_grouped,
        train_tail = panel_tail(training(splits), id, FORECAST_HORIZON)
    )

model_fit_glmet_recursive <- workflow() %>%
    add_model(linear_reg(penalty = 100, mixture = 0.5) %>% set_engine("glmnet")) %>%
    add_recipe(recipe_spec) %>%
    fit(training(splits)) %>%
    recursive(
        id         = "id", 
        transform  = lag_roll_transformer_grouped,
        train_tail = panel_tail(training(splits), id, FORECAST_HORIZON)
    )

calibration_tbl <- modeltime_table(
    model_fit_lm_recursive,
    model_fit_glmet_recursive
) %>%
    modeltime_calibrate(testing(splits))

calibration_tbl %>% modeltime_accuracy()
#> # A tibble: 2 x 9
#>   .model_id .model_desc .type   mae  mape  mase smape  rmse   rsq
#>       <int> <chr>       <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1         1 LM          Test  1100.  30.5 0.158  29.8 1622. 0.855
#> 2         2 GLMNET      Test   964.  26.6 0.139  25.2 1372. 0.899

forecast_tbl <- modeltime_table(
    model_fit_lm_recursive,
    model_fit_glmet_recursive
) %>% 
    modeltime_forecast(
        new_data    = testing(splits),
        actual_data = bind_rows(training(splits), testing(splits)),
        keep_data   = TRUE
    ) %>%
    group_by(id) 

forecast_tbl %>%
    plot_modeltime_forecast(
        .facet_ncol  = 2,
        .interactive = F
    )

Created on 2021-03-25 by the reprex package (v1.0.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants