-
Notifications
You must be signed in to change notification settings - Fork 22
summary
Usage:
bwtool summary - provide some summary stats for each region in a bed file
or at regular intervals.
usage:
bwtool summary loci input.bw[:chr:start-end] output.txt
where:
-"loci" corresponds to either (a) a bed file with regions to summarize or
(b) a size of interval to summarize genome-wide.
options:
-with-quantiles output 10%/25%/75%/90% quantiles as well surrounding the
median. With -total, this essentially provides a boxplot.
-with-sum-of-squares
output sum of squared deviations from the mean along with
the other fields
-with-sum output sum, also
-keep-bed if the loci bed is given, keep as many bed file
-total only output a summary as if all of the regions are pasted
together
-header put in a header (fields are easy to forget)
Using the same example from the aggregate page:
we can first get the summary of each 10 bp: 1-10, 11-20, 21-30, and 31-36 (6 bp in this case). For this demonstration we'll use the -header option, but often times this option isn't necessary, especially if there is any post-processing of the summary:
$ bwtool summary 10 main.bigWig /dev/stdout -header -with-sum
#chrom start end size num_data min max mean median sum
chr 0 10 10 10 1.00 6.00 4.00 5.00 40.00
chr 10 20 10 10 0.00 10.00 4.00 3.50 40.00
chr 20 30 10 6 1.00 4.00 2.33 2.00 14.00
chr 30 36 6 6 2.00 6.00 4.33 4.00 26.00
Everything should be relatively self-explanatory here, although perhaps it's worth mentioning that sometimes these examples switch between referring to bases as 1-10 and 0-10. This is because BED format uses half-open zero-based intervals while WIG format uses 1-based. The former is convenient in terms of calculating the size of a region, while the latter is easier when drawing pictures. In bases 21-30 the num_data column is 6 because 4 bases have missing data. If one prefers to treat missing data as zero, the -fill=0 option can be used:
$ bwtool summary 10 main.bigWig /dev/stdout -header -with-sum -fill=0
#chrom start end size num_data min max mean median sum
chr 0 10 10 10 1.00 6.00 4.00 5.00 40.00
chr 10 20 10 10 0.00 10.00 4.00 3.50 40.00
chr 20 30 10 10 0.00 4.00 1.40 1.50 14.00
chr 30 36 6 6 2.00 6.00 4.33 4.00 26.00
with num_data, mean, and median changing correspondingly.