Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specifying tau value for superblock in rgcca_cv() #86

Open
JChRoy opened this issue Nov 6, 2024 · 4 comments
Open

Specifying tau value for superblock in rgcca_cv() #86

JChRoy opened this issue Nov 6, 2024 · 4 comments

Comments

@JChRoy
Copy link

JChRoy commented Nov 6, 2024

Dear Rgcca team,

I would like, as CPCA(n) methods, to set tau = 1 for all blocks and tau =0 for a superblock in rgcca_cv().
Is there an option in rgcca_cv which can do it or is it necessary to adapt the source code?

Here is the code I used for now:
`
C <- matrix(c(0, 0, 0, 0,
0, 0, 0, 0,
0, 0, 0, 0,
0, 0, 0, 0),
4, 4, byrow=T)

Trying to reproduce cpca-1

ncomp <- rgcca_cv(
blocks = data_train, # list of 4 dataframes
response=4,
scheme="horst",
connection = C,
tau=c(1,1,1,1),
superblock=TRUE,
scale=TRUE,
scale_block='inertia',
init='svd',
bias=TRUE,
tol=1e-08,
NA_method='na.ignore',
comp_orth=TRUE,
par_type = "ncomp",
prediction_model = "lm",
validation = 'kfold',
k=5,
n_run=10,
n_cores = n_cores)`

Thanks a lot!

Best,
Jean-Charles

@GFabien
Copy link
Collaborator

GFabien commented Nov 6, 2024

Hi Jean-Charles,
Thank you for your interest in the package!

I am a bit confused about your matrix C. If you set it to zero, you should get an error stating that it should not contain only zeros. Indeed, it means you do not connect the blocks at all, so there is no point in performing a multiblock analysis.

Regarding the choice of tau in rgcca_cv(), you can set tau = c(1, 1, 1, 0) if you want to set tau to 1 for all blocks except the 4th one.

I hope it helps.

Best,
Fabien

@JChRoy
Copy link
Author

JChRoy commented Nov 7, 2024

Hi Fabien,

Thank you for your answer.
You are right: I thought rgcca_cv() would automatically add the connections between the blocks and the superblock by setting superblock = TRUE. However, the function works because it corrects the design matrix such as only the response block is connected to all the others and set superblock to FALSE such as:

> summary(ncomp)
Call: method='rgcca', superblock=FALSE, scale=TRUE, scale_block='inertia', init='svd', bias=TRUE,
tol=1e-08, NA_method='na.ignore', ncomp=c(1,1,1,1), response=4, comp_orth=TRUE 

Then my question is how are we suppose to use superblock=TRUE in rgcca_cv()? Do we just add a dataframe as a concatenation of the blocks and set it as a response?

Best,
Jean-Charles

@GFabien
Copy link
Collaborator

GFabien commented Nov 10, 2024

Hi Jean-Charles,

As a matter of fact, rgcca_cv() is not supposed to be used with a superblock. In the multiblock litterature, a superblock analysis is more unsupervised, while an analysis with a response block (like rgcca_cv()) is more supervised. However, you can indeed do as you suggest, but it will give you different results from a proper superblock analysis if you have more than one component (deflation logics are different).

I hope it helps, don't hesitate to elaborate on the type of analysis you want to conduct if it does not meet your needs.
Best,
Fabien

@JChRoy
Copy link
Author

JChRoy commented Nov 13, 2024

Hi Fabien,
Thanks for your answers. It makes a lot of sense.
Best,
Jean-Charles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants