-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in rgcca_stability with null sd variables in the resampling and rows of missing values #84
Comments
Hi Elen! Thank you for reporting this issue! Could you try the branch https://github.com/rgcca-factory/RGCCA/tree/fix_extra_index_in_keepVar and tell me if it's enough? I don't think we can really be robust to this problem, but we can definitely help the user no to be lost when this happens. Best, |
Hi @Tenenhaus, I thought about it and it relates to some other problems I am facing with TGCCA. The core of the problem is that we remove variables with null variance since they will not contribute to the objective function, and we might get into trouble if we try to scale such variables. I don't think we really need to remove those variables, they will give zeros in the associated weight vectors anyway. To handle the scaling part, we can take a small epsilon if the std is null to avoid numerical problems, and since the variables would be centered, it would be 0 / epsilon in any case. It would also solve other problems like defining what is a constant variable for TGCCA or multigroup RGCCA, having bootstrap samples with different variables, a sparsity constant that depends on the number of non constant variables instead of the total number of variables, and outputs having different number of variables than inputs. What do you think about it? Best, |
Hi Fabien! |
Hi Elen! |
Hi Team RGCCA,
I encountered the following error when using
rgcca_stability
:Here is an example to reproduce the error:
When trying to reproduce the error, I identified that this behavior only occurs with full rows of missing values (which can happen when blocks are not observed on fully overlapping sets of individuals). In this scenario, the error message is unclear which makes the origin of the error difficult to understand for the user. However, when multiple variables have null sd in the bootstrap samples but there are no full rows of missing data, the behavior is correctly understood and a clear and informative message is sent. The underlying issue could be in
rgcca_bootstrap_k
or could maybe be caught earlier ingenerate_resampling
.Do you think this type of error could be caught to avoid misunderstandings?
Thank you :)
Elen
The text was updated successfully, but these errors were encountered: