Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement -minMax option of bigWigAverageOverBed. #38

Closed
ghuls opened this issue May 14, 2024 · 3 comments · Fixed by #42
Closed

Implement -minMax option of bigWigAverageOverBed. #38

ghuls opened this issue May 14, 2024 · 3 comments · Fixed by #42

Comments

@ghuls
Copy link
Contributor

ghuls commented May 14, 2024

Implement -minMax option of bigWigAverageOverBed in bigwigaverageoverbed- (as eg: -minmax)

$ bigWigAverageOverBed 
bigWigAverageOverBed v2 - Compute average score of big wig over each bed, which may have introns.
usage:
   bigWigAverageOverBed in.bw in.bed out.tab
The output columns are:
   name - name field from bed, which should be unique
   size - size of bed (sum of exon sizes
   covered - # bases within exons covered by bigWig
   sum - sum of values over all bases covered
   mean0 - average over bases with non-covered bases counting as zeroes
   mean - average over just covered bases
Options:
   -stats=stats.ra - Output a collection of overall statistics to stat.ra file
   -bedOut=out.bed - Make output bed that is echo of input bed but with mean column appended
   -sampleAroundCenter=N - Take sample at region N bases wide centered around bed item, rather
                     than the usual sample in the bed item.
   -minMax - include two additional columns containing the min and max observed in the area.
@LukasMahieu
Copy link

Seems like a minor change but would be a major improvement for us since we use max and count often. Any plans to implement this @jackh726 ?

@jackh726
Copy link
Owner

Yes, I can try to get to this over the weekend! Should be a small and easy change.

@ghuls
Copy link
Contributor Author

ghuls commented Jun 13, 2024

@jackh726 I almost have a pull request ready for bigwigaverageoverbed and pybigtools:

In [21]: ?pybig.bigWigAverageOverBed
Signature: pybig.bigWigAverageOverBed(bigwig, bed, names=None, stats=None)
Docstring:
Gets the average values from a bigWig over the entries of a bed file.

Parameters:
    bigWig (str): The path to the bigWig.
    bed (str): The path to the bed.
    names (None, bool, or int):
        If `None`, then each return value will be a single `float` or `int`,
            the calculated statistics value over an interval in the bed file.
        If `True`, then each return value will be a tuple of the value of column 4 and the
            calculated statistics value over the interval with that name in the bed file.
        If `False`, then each return value will be a tuple of the interval in the format
            `{chrom}:{start}-{end}` and the calculated statistics value over that interval.
        If `0`, then each return value will match as if `False` was passed.
        If a `1+`, then each return value will be a tuple of the value of column of this
            parameter (1-based) and the calculated statistics value over the interval.
    stats (None, str): Calculate specific statistics for each bed entry:
        - `size`: Size of bed entry (`int`)
        - `bases`: Bases covered by bigWig (`int`)
        - `sum`: Sum of values over all bases covered (`float`)
        - `mean0`: Average over bases with non-covered bases counting as zeroes (`float`)
        - `mean` or `None`: Average over just covered bases (`float`)
        - `min`: Minimum over all bases covered (`float`)
        - `max`: Maximum over all bases covered (`float`)
        - `all`: `size`, `bases`, `sum`, `mean0`, `mean`, `min` and `max`

Returns:
    This returns a generator of values. (Therefore, to save to a list, do `list(bigWigAverageOverBed(...))`)
    If no name is specified (see the `names` parameter above), then returns a generator of `float`s or `int`s
    depending on the chosen statistics to calculate.
    If a name column is specified (see above), then returns a generator of tuples `({name}, {stats})`
Type:      builtin_function_or_method


In [7]: bwaob = pybig.bigWigAverageOverBed(bigwig="Astro.bw", bed="consensus_peaks_bicnn.bed", names=True, stats="all")

In [8]: for i, entry in enumerate(bwaob):
   ...:     if i == 10:
   ...:         break
   ...:     print(entry)
   ...: 
('chr1:3094805-3095305', 500, 500, 31.629608038812876, 0.06325921607762575, 0.06325921607762575, 0.044780001044273376, 0.08956000208854675)
('chr1:3095470-3095970', 500, 347, 7.746940311975777, 0.015493880623951555, 0.022325476403388406, 0.0, 0.044780001044273376)
('chr1:3112174-3112674', 500, 197, 3.239086809568107, 0.006478173619136214, 0.01644206502318836, 0.0, 0.0298533346503973)
('chr1:3113534-3114034', 500, 168, 2.5076801106333733, 0.005015360221266746, 0.01492666732519865, 0.0, 0.01492666732519865)
('chr1:3119746-3120246', 500, 500, 18.73296721186489, 0.03746593442372978, 0.03746593442372978, 0.01492666732519865, 0.044780001044273376)
('chr1:3120272-3120772', 500, 500, 23.28560095373541, 0.04657120190747082, 0.04657120190747082, 0.01492666732519865, 0.07463333755731583)
('chr1:3121251-3121751', 500, 348, 13.463853831402957, 0.026927707662805916, 0.038689235147709646, 0.0, 0.07463333755731583)
('chr1:3134586-3135086', 500, 385, 8.045473688282073, 0.016090947376564146, 0.02089733425527811, 0.0, 0.0298533346503973)
('chr1:3165708-3166208', 500, 294, 4.388440193608403, 0.008776880387216806, 0.01492666732519865, 0.0, 0.01492666732519865)
('chr1:3166923-3167423', 500, 99, 1.4777400651946664, 0.002955480130389333, 0.01492666732519865, 0.0, 0.01492666732519865)

In [9]: bwaob = pybig.bigWigAverageOverBed(bigwig="Astro.bw", bed="consensus_peaks_bicnn.bed", names=False, stats="all")

In [10]: for i, entry in enumerate(bwaob):
    ...:     if i == 10:
    ...:         break
    ...:     print(entry)
    ...: 
('chr1\t3094805\t3095305\tchr1:3094805-3095305', 500, 500, 31.629608038812876, 0.06325921607762575, 0.06325921607762575, 0.044780001044273376, 0.08956000208854675)
('chr1\t3095470\t3095970\tchr1:3095470-3095970', 500, 347, 7.746940311975777, 0.015493880623951555, 0.022325476403388406, 0.0, 0.044780001044273376)
('chr1\t3112174\t3112674\tchr1:3112174-3112674', 500, 197, 3.239086809568107, 0.006478173619136214, 0.01644206502318836, 0.0, 0.0298533346503973)
('chr1\t3113534\t3114034\tchr1:3113534-3114034', 500, 168, 2.5076801106333733, 0.005015360221266746, 0.01492666732519865, 0.0, 0.01492666732519865)
('chr1\t3119746\t3120246\tchr1:3119746-3120246', 500, 500, 18.73296721186489, 0.03746593442372978, 0.03746593442372978, 0.01492666732519865, 0.044780001044273376)
('chr1\t3120272\t3120772\tchr1:3120272-3120772', 500, 500, 23.28560095373541, 0.04657120190747082, 0.04657120190747082, 0.01492666732519865, 0.07463333755731583)
('chr1\t3121251\t3121751\tchr1:3121251-3121751', 500, 348, 13.463853831402957, 0.026927707662805916, 0.038689235147709646, 0.0, 0.07463333755731583)
('chr1\t3134586\t3135086\tchr1:3134586-3135086', 500, 385, 8.045473688282073, 0.016090947376564146, 0.02089733425527811, 0.0, 0.0298533346503973)
('chr1\t3165708\t3166208\tchr1:3165708-3166208', 500, 294, 4.388440193608403, 0.008776880387216806, 0.01492666732519865, 0.0, 0.01492666732519865)
('chr1\t3166923\t3167423\tchr1:3166923-3167423', 500, 99, 1.4777400651946664, 0.002955480130389333, 0.01492666732519865, 0.0, 0.01492666732519865)

In [11]: bwaob = pybig.bigWigAverageOverBed(bigwig="Astro.bw", bed="consensus_peaks_bicnn.bed", names=None, stats="all")

In [12]: for i, entry in enumerate(bwaob):
    ...:     if i == 10:
    ...:         break
    ...:     print(entry)
    ...: 
(500, 500, 31.629608038812876, 0.06325921607762575, 0.06325921607762575, 0.044780001044273376, 0.08956000208854675)
(500, 347, 7.746940311975777, 0.015493880623951555, 0.022325476403388406, 0.0, 0.044780001044273376)
(500, 197, 3.239086809568107, 0.006478173619136214, 0.01644206502318836, 0.0, 0.0298533346503973)
(500, 168, 2.5076801106333733, 0.005015360221266746, 0.01492666732519865, 0.0, 0.01492666732519865)
(500, 500, 18.73296721186489, 0.03746593442372978, 0.03746593442372978, 0.01492666732519865, 0.044780001044273376)
(500, 500, 23.28560095373541, 0.04657120190747082, 0.04657120190747082, 0.01492666732519865, 0.07463333755731583)
(500, 348, 13.463853831402957, 0.026927707662805916, 0.038689235147709646, 0.0, 0.07463333755731583)
(500, 385, 8.045473688282073, 0.016090947376564146, 0.02089733425527811, 0.0, 0.0298533346503973)
(500, 294, 4.388440193608403, 0.008776880387216806, 0.01492666732519865, 0.0, 0.01492666732519865)
(500, 99, 1.4777400651946664, 0.002955480130389333, 0.01492666732519865, 0.0, 0.01492666732519865)

In [13]: bwaob = pybig.bigWigAverageOverBed(bigwig="Astro.bw", bed="consensus_peaks_bicnn.bed", names=None, stats="max")

In [14]: for i, entry in enumerate(bwaob):
    ...:     if i == 10:
    ...:         break
    ...:     print(entry)
    ...: 
0.08956000208854675
0.044780001044273376
0.0298533346503973
0.01492666732519865
0.044780001044273376
0.07463333755731583
0.07463333755731583
0.0298533346503973
0.01492666732519865
0.01492666732519865

In [15]: bwaob = pybig.bigWigAverageOverBed(bigwig="Astro.bw", bed="consensus_peaks_bicnn.bed", names=None, stats="mean0")

In [16]: for i, entry in enumerate(bwaob):
    ...:     if i == 10:
    ...:         break
    ...:     print(entry)
    ...: 
0.06325921607762575
0.015493880623951555
0.006478173619136214
0.005015360221266746
0.03746593442372978
0.04657120190747082
0.026927707662805916
0.016090947376564146
0.008776880387216806
0.002955480130389333

In [17]: bwaob = pybig.bigWigAverageOverBed(bigwig="Astro.bw", bed="consensus_peaks_bicnn.bed", names=None, stats="mean")

In [18]: for i, entry in enumerate(bwaob):
    ...:     if i == 10:
    ...:         break
    ...:     print(entry)
    ...: 
0.06325921607762575
0.022325476403388406
0.01644206502318836
0.01492666732519865
0.03746593442372978
0.04657120190747082
0.038689235147709646
0.02089733425527811
0.01492666732519865
0.01492666732519865

In [19]: bwaob = pybig.bigWigAverageOverBed(bigwig="Astro.bw", bed="consensus_peaks_bicnn.bed", names=None, stats=None)

In [20]: for i, entry in enumerate(bwaob):
    ...:     if i == 10:
    ...:         break
    ...:     print(entry)
    ...: 
0.06325921607762575
0.022325476403388406
0.01644206502318836
0.01492666732519865
0.03746593442372978
0.04657120190747082
0.038689235147709646
0.02089733425527811
0.01492666732519865

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants