-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about scaling propr to very large data sets (Part 2) #24
Comments
Thanks for your interest in propr! It sounds like the data can be thought of as "multi-omics data" in that you have two views of the same subjects. I'll assume A and B are both compositional. Could you please have a look at #9 and let me know if you have any further questions? Best, tpq |
Hey tpq Is that even possible or should I just try the #9 solution and then pick the pairs that I care about? |
Ah, I see. Unfortunately, I have not made any functions for this case. The #9 solution would be the way to go. You may find the functions You might want to try something like...
Each j-th element in the for loop would give you proportionality between A[,j] and B. It'll also break down the 240k^2 results into 200 separate 40k^2 results, which is easier to parallelize. |
Great! Thanks a lot! I will give it a try and see how it goes. I m running everything on a cluster so hopefully it should be feasible! |
Hey, works nicely, thanks a lot! |
Our reason for calculating FDR on only positive values is that the negative values can be difficult to interpret (see #4). I don't feel comfortable endorsing exact p-value calculation from rho using t-approximation, simply because I don't know enough to know whether the approach is valid. Permutation is computationally expensive, but also easy to implement. Have a look at propr:::updateCutoffs.propr source code for an example.
You could, for example, compute a small null distribution of negative rho for each A[,j]. This would break up the FDR computations into parallelizable chunks. |
I see. That is a bit unfortunately as I am primarily interested in negative correlations. Is there any other metric or any other way I could try to look for negative correlations besides the negative r values? or is there any way to exclude false positives? |
I think the key issue is that the negative rho (like negative correlations) would be highly sensitive to the choice of the reference. They may mean something if you believe the CLR is a suitable normalization method. Without normalization, I do not know of any ways to obtain true negative pairwise associations. In the case of multi-omics data analysis, however, you might be satisfied with knowing what associations exist with respect to the CLR centers. In this case, negative rho (or indeed negative Pearson) may hold some meaning to you. You might find our discussion of this topic helpful https://www.nature.com/articles/s41592-020-01006-1 |
OK, thanks! i ll look at it carefully tomorrow and see if i can figure sth out! |
Dear Thomas |
Hi all
First of congrats on your wonderful software, just read about it and it looks very promising.
I have a question/request.
I have two datasets (coming from the same samples (same fastq files) but have been generated using different databases) and I want to compare the elements (features) of dataset A against the elements of dataset B only. This is simply because otherwise this project is not feasible.
Dataset A has 200k features and dataset B 40k features. And thus I want to check for correlations of the
A[1] ~ B[1]
A[1] ~ B[2]
A[1] ~ B[3]
....
A[2] ~ B[1]
A[2] ~ B[2]
....
etc etc
Now normally (using the good ol spearman -apologies for speaking the name who shall not be spoken out loud- ) I would use sth like this:
rho <- matrix(NA, nrow = ncol(x), ncol=ncol(y)) for (i in 1:ncol(x)) {for (j in 1:ncol(y)) { rho[i, j] <- cor.test(x[,i], y[,j], method = "spearman")$estimate}}
(x and y are the two datasets)
but when I tried
propr(x[1]~ y[2], ivar = NA)
or
propr(x[1], y[2], ivar = NA)
does not seem to work... so my question is: is this possible at all?
ps: my data are already clr transformed, this is why I used 'ivar=NA'
thanks in advance
The text was updated successfully, but these errors were encountered: