Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in rgcca_stability: suscript out of bound #85

Open
tonyliang19 opened this issue Aug 29, 2024 · 6 comments
Open

Error in rgcca_stability: suscript out of bound #85

tonyliang19 opened this issue Aug 29, 2024 · 6 comments

Comments

@tonyliang19
Copy link

Hello, I got an error of Error in x[, y, drop = FALSE] : subscript out of bounds when using rgcca_stability.

I have used something like this:

# Some dummy input here
rgcca_input <- list(X1= X1, X2=X2, X3=X3, response=response)


cv_out <- rgcca_cv(
    blocks = rgcca_input, response = length(rgcca_input),
    par_type = "sparsity",
    prediction_model = "glm",
    validation = "kfold",
    k = 5, n_run = 1, metric = "Balanced_Accuracy)
  
  # Then fit rgcca on the cv_out
  fit <- rgcca(cv_out)
  
  # Get stable variables out
  stab <- rgcca_stability(fit)

After some debug, I noticed it goes wrong under this code of the rgcca_stability:

rgcca_res$call$blocks <- Map(function(x, y) x[, y, drop = FALSE], 
    rgcca_res$call$blocks, keepVar)

So I checked dimensions of both, and if any indices in keepVar to be wrong. These were the values of rgcca_res$call$blocks, and keepVar:

image

So, apparently the keepVar has an extra indice in each of the block, and these are always the column dimension + 1, could this be the reason that the whole rgcca_stability function is failing?

Thanks!
Tony

@tonyliang19 tonyliang19 changed the title Error in rgcca_stability: suscript out of bount Error in rgcca_stability: suscript out of bound Aug 30, 2024
@GFabien
Copy link
Collaborator

GFabien commented Nov 6, 2024

Hi Tony,

Sorry for the delay, I will have some time to spend on the package soon so I'll dive into this problem. Is it possible for you to share the date spawning the problem or to produce a minimal reproducible example? Thank you for your debugging this far!

Best,
Fabien

@tonyliang19
Copy link
Author

tonyliang19 commented Nov 7, 2024

Hi Fabien,

I try reproducing the bug, but it is sometimes showing up the error, and sometime not. Very strange??

See here for how I ran it:

image

The raw_data I have stored in a public repo here, link points to the download of 100KB

Here's the "reproducible" example:

set.seed(3)
rgcca_input <- readRDS("raw_data.rds")

cv_out <- rgcca_cv(
  blocks = rgcca_input, response = length(rgcca_input),
  par_type = "sparsity",
  prediction_model = "lda",
  validation = "kfold",
  k = 5, n_run = 1, metric = "Balanced_Accuracy")
# Then fit rgcca on the cv_out
fit <- rgcca(cv_out)
stab <- rgcca_stability(fit)

And this is the sessioInfo() , ran on a docker container:

R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] magrittr_2.0.3              here_1.0.1                  dplyr_1.1.4                
 [4] tibble_3.2.1                MultiAssayExperiment_1.30.1 SummarizedExperiment_1.34.0
 [7] Biobase_2.64.0              GenomicRanges_1.56.0        GenomeInfoDb_1.40.0        
[10] IRanges_2.38.0              S4Vectors_0.42.0            BiocGenerics_0.50.0        
[13] MatrixGenerics_1.16.0       matrixStats_1.3.0           RGCCA_3.0.3                

loaded via a namespace (and not attached):
 [1] pbapply_1.7-2           pROC_1.18.5             gridExtra_2.3           rlang_1.1.4            
 [5] e1071_1.7-14            compiler_4.4.0          callr_3.7.6             vctrs_0.6.5            
 [9] reshape2_1.4.4          stringr_1.5.1           fastmap_1.1.1           pkgconfig_2.0.3        
[13] crayon_1.5.2            XVector_0.44.0          utf8_1.2.4              rmarkdown_2.26         
[17] prodlim_2023.08.28      ps_1.7.6                UCSC.utils_1.0.0        purrr_1.0.2            
[21] xfun_0.43               reprex_2.1.1            zlibbioc_1.50.0         jsonlite_1.8.8         
[25] recipes_1.0.10          rhdf5filters_1.16.0     DelayedArray_0.30.1     Rhdf5lib_1.26.0        
[29] Deriv_4.1.3             parallel_4.4.0          R6_2.5.1                stringi_1.8.4          
[33] parallelly_1.37.1       rpart_4.1.23            lubridate_1.9.3         Rcpp_1.0.12            
[37] iterators_1.0.14        knitr_1.46              future.apply_1.11.2     BiocBaseUtils_1.6.0    
[41] Matrix_1.7-0            splines_4.4.0           nnet_7.3-19             timechange_0.3.0       
[45] tidyselect_1.2.1        yaml_2.3.8              rstudioapi_0.16.0       abind_1.4-5            
[49] timeDate_4032.109       codetools_0.2-20        processx_3.8.4          listenv_0.9.1          
[53] lattice_0.22-6          plyr_1.8.9              withr_3.0.0             evaluate_0.23          
[57] future_1.33.2           survival_3.6-4          proxy_0.4-27            pillar_1.9.0           
[61] foreach_1.5.2           generics_0.1.3          rprojroot_2.0.4         ggplot2_3.5.1          
[65] munsell_0.5.1           scales_1.3.0            globals_0.16.3          class_7.3-22           
[69] glue_1.7.0              tools_4.4.0             data.table_1.15.4       ModelMetrics_1.2.2.2   
[73] gower_1.0.1             fs_1.6.4                rhdf5_2.48.0            grid_4.4.0             
[77] ipred_0.9-14            colorspace_2.1-0        nlme_3.1-164            GenomeInfoDbData_1.2.12
[81] HDF5Array_1.32.0        cli_3.6.2               fansi_1.0.6             S4Arrays_1.4.1         
[85] lava_1.8.0              gtable_0.3.5            digest_0.6.35           caret_6.0-94           
[89] SparseArray_1.4.5       ggrepel_0.9.5           htmltools_0.5.8.1       lifecycle_1.0.4        
[93] hardhat_1.4.0           httr_1.4.7              MASS_7.3-60.2          

@llrs
Copy link
Contributor

llrs commented Nov 7, 2024

It might help to pinpoint where the error happens if once the error is raised you use traceback() (and share the output). Something like browser(expr = any(!y %in% names(x)) could be used inside the function identified with traceback() to explore the issue.

My bet is that this might be due to the random seed (for matrix factorization or similar process). Perhaps finding one seed that always trigger it could be also helpful for @GFabien.

@tonyliang19
Copy link
Author

tonyliang19 commented Nov 8, 2024

Sure, the traceback that I got is here:

traceback()
8: (function (x, y) 
   x[, y, drop = FALSE])(dots[[1L]][[1L]], dots[[2L]][[1L]])
7: mapply(FUN = f, ..., SIMPLIFY = FALSE)
6: Map(function(x, y) x[, y, drop = FALSE], rgcca_res$call$blocks, 
       keepVar)
5: rgcca_stability(fit) at reprox.R#11
4: eval(ei, envir)
3: eval(ei, envir)
2: withVisible(eval(ei, envir))
1: source("~/reprox.R")

And I could get the result reproducible with set.seed(3), quite lucky to get this seed?

@GFabien
Copy link
Collaborator

GFabien commented Nov 10, 2024

Hi!

Thank you for your debugging, it helped a lot! Can you have a look at branch https://github.com/rgcca-factory/RGCCA/tree/fix_extra_index_in_keepVar and tell me if it fixes your problem?

Best,
Fabien

@tonyliang19
Copy link
Author

Hi Fabien,

Thanks for the quick response, I have tried the new branch and worked for me!

Tony

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants