Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about why not to use negative rho #4

Open
maubarsom opened this issue Feb 5, 2019 · 6 comments
Open

Question about why not to use negative rho #4

maubarsom opened this issue Feb 5, 2019 · 6 comments
Labels
helpful This question has been marked as potentially helpful to others. question

Comments

@maubarsom
Copy link

Hi!

I have a quick question I couldn't entirely figure out of the documentation/vignettes.

I created a propr object, as follows:

rho <- propr(counts.matrix, metric = "rho")

I was wondering what the actual meaning of executing rho['>' , 0.9] would be? My interpretation is to selecting all pairs of features with an absolute value( rho) > 0.9(, and adding all this features to the @pairs index). Is this correct?

In a naive implementation, I would interpret this as keeping all rows (or columns) in which at least one of values' absolute value > 0.9.

If this is correct, I might be running into a bug with the filtering . I'm running version 4.1.1 from CRAN. I just want to make sure I understand the behavior before going through the trouble of explaining the bu.

Thank you!
~Mauricio

@tpq
Copy link
Owner

tpq commented Feb 5, 2019

Hi Mauricio,

Thanks for your interest in propr!

rho[">", 0.9] would select pairs with the value (not absolute value) of rho > 0.9, and adding these pairs to the @pairs index.

I would suggest being careful when studying negative values of rho. I have found that they are not always directly analogous to negative correlations, making their interpretation difficult.

PS: If you want more control of the analysis, I recently introduced some helper functions: getMatrix and getResults to extract simple matrices from the S4 object.

Please let me know if you still suspect a bug!!

Thanks,
Thom

@maubarsom
Copy link
Author

Ahh, this explains it, thank you very much! I was just assuming the the filtering considered the absolute value, but it makes sense to have a more general select👍

Could you please elaborate a little bit on what you mean with the negative values of rho and their relationship to negative correlations?

Also, from this, the bootstrapping for the choice of threshold, does this consider only positive proportionality? Or is the thresholding valid for negative values of rho as well?

Thank you for the swift reply!
Mauricio

@tpq
Copy link
Owner

tpq commented Feb 7, 2019

I'll try my best to explain this succinctly, but please note that I am still trying to understand it myself!

Let's start by looking at the formula:

screenshot from 2019-02-07 10-19-19

For rho_p = 1, the numerator (which is the var(log(x / y)) ought to approach zero. Only one thing causes this to happen: the ratio x / y is fixed for all samples (i.e., proportional). Proportional events are always correlated.
For rho_p = -1, the numerator ought to approach 2x the denominator. This means that the variance of the ratio is twice the sum of the individual variances. It is very hard for me to imagine all events that satisfy this condition, but let us look empirically as the distribution of absolute correlations (y-axis) vs. proportionality (x-axis) (taken from Scientific Reports: 7(16252)):

screenshot from 2019-02-07 10-15-43

We see on the right that proportional events (rho_p -> 1) are correlated events (rho -> 1). But, on the left, we see that the anti-proportional events (rho_p -> -1) are also correlated events (rho -> 1)!!!

This seems to happen when there is a strong compositional constraint on the data (e.g., we do not see in Figure 5). Possibly, these strange events arise when the individual genes are correlated with the geometric mean center. This would shrink the denominator, driving rho_p -> -1, even if the numerator is quite small. These events seem to matter less for rho_p -> 1, because they would induce false negatives rather than false positives.

@tpq
Copy link
Owner

tpq commented Feb 7, 2019

As for your other question, updateCutoffs is only valid for positive values of rho!! Sorry, I will clarify this in the documentation.

I'll also ping Ionas Erb to see if he has more to add about negative proportionality.

@ionase
Copy link

ionase commented Feb 7, 2019

Hi Mauricio,

As Thom pointed out, for rho_p to be +1, the ratio between the variables has to be constant, i.e. y = m x for each sample, with m a positive constant (independent of the sample). Now for rho_p = -1 (i.e. numerator twice the denominator), one can show that y/r = m r/x, where r is the value of the reference (e.g., the geometric mean over the variables) in the sample. So y and x are reciprocal. As you can see, the reference does not cancel (as it does for rho_p being exactly +1), so these reciprocal events are not robust with respect to the choice of reference, which makes them less interesting perhaps.

Cheers,
Ionas

@maubarsom
Copy link
Author

Thank you so much for the great replies!!! I need to sit down and mull over it 😄

Cheers,
Mauricio

@tpq tpq closed this as completed Aug 24, 2019
@tpq tpq changed the title Clarification on the behavior of the '[' select operator Question about why not to use negative rho Sep 1, 2020
@tpq tpq added question helpful This question has been marked as potentially helpful to others. labels Sep 1, 2020
@tpq tpq reopened this Sep 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
helpful This question has been marked as potentially helpful to others. question
Projects
None yet
Development

No branches or pull requests

3 participants