Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with THOR p-values #124

Closed
plbaldoni opened this issue Jul 22, 2019 · 2 comments
Closed

Issue with THOR p-values #124

plbaldoni opened this issue Jul 22, 2019 · 2 comments
Assignees
Labels

Comments

@plbaldoni
Copy link

Hi,

I am running THOR with the options "--merge --no-correction --pvalue 1.0" aiming to get the set of all potential peaks and their raw p-values.

After loading the results in R, the output looks like the following:

> thor = read.table('~/Downloads/THOR_Output-diffpeaks.narrowPeak')
> 
> head(thor)
    V1     V2     V3    V4 V5 V6 V7          V8 V9 V10
1 chr1  16350  16500 Peak0  0  +  0  0.08121752  0  -1
2 chr1 761100 761300 Peak1  0  +  0 -0.40853573  0  -1
3 chr1 776950 777000 Peak2  0  -  0  0.31362374  0  -1
4 chr1 777050 777350 Peak3  0  -  0  0.01068111  0  -1
5 chr1 882750 883050 Peak4  0  +  0 -0.34060708  0  -1
6 chr1 884100 884450 Peak5  0  -  0 -0.35914791  0  -1

Which transformation has been used to produce the pValue (8th column) of the output? Is there a way I can get the raw p-values ranging on (0,1) from each peak? It looks like neither -log10 nor log10 gives me these quantities:

> summary(thor$V8)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-3.36502 -0.72130 -0.31481 -0.37389  0.04078  0.52139 
> summary(10^(-thor$V8))
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
   0.3010    0.9104    2.0645    6.1711    5.2638 2317.4833 
> summary(10^(thor$V8))
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.000432 0.189976 0.484389 0.779523 1.098453 3.321928 

Best wishes,
Pedro Baldoni

@fabio-t fabio-t self-assigned this Jul 29, 2019
@michael-kotliar
Copy link

Got the same problem. It looks like a complete mess with all this -log10 and raw p-values.
If I run with --no-correction flag I assume that my filtering is done by filter_by_pvalue_strand_lag function. pvalues in this case are already log transformed, otherwise there isn't any logical explanation why we log transform our pcutoff.

pv_pass = np.where(np.asarray(pvalues) >= -log10(pcutoff), True, False)

Then, when exporting the results in _output_BED function we log transform our pvalues one more time

p_tmp = -log10(pvalues[i]) if pvalues[i] > 0 else sys.maxint

@fabio-t fabio-t removed the maybe label Dec 4, 2019
@fabio-t
Copy link
Member

fabio-t commented Dec 4, 2019

Thanks all for reporting, this will be fixed in the next release. Using the --no-correction argument the p-values were indeed "double log10'd", so @plbaldoni in your original example, if you wanted a quick and dirty solution before the release, you could do:

summary(10^(-10^(-thor$V8)))

@fabio-t fabio-t mentioned this issue Mar 2, 2020
@fabio-t fabio-t closed this as completed in 2054b25 Mar 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants