-
Notifications
You must be signed in to change notification settings - Fork 22
aggregate
Usage message:
bwtool aggregate - produce plot data as averages surrounding given regions
seen in the bigWig.
usage:
bwtool aggregate left:right a.bed,b.bed,... x.bw,y.bw,... output.txt
bwtool aggregate left:right bed.lst bigWig.lst output.txt
where:
left:right is a description of the range to use surrounding the
centers/starts. A value of 200:300 will create plot points
for -200 bp to +300 bp from the center of all the regions.
bed.lst/bigWig.lst are files containing lists of files to be used
in place of a comma-separated list.
options:
-starts use starts of bed regions as opposed to the middles
-ends use ends of bed regions as opposed to the middles
-firstbase in this case the zero base is used so output
has left+right+1 lines
-expanded output medians and standard deviations instead of just
averages
-cluster=k cluster with k-means with given parameter (2-10 are best)
-cluster-sets=file.bed
write out the original bed with the cluster label along
with the accompanying region used for the clustering
-long-form=[label1,label2,...,labeln]
output "long form" where each line is just the position
and one of the values and the first column is the name of
the file.
-verbose=n output some progress info every n regions loaded
Suppose we have the following data in a bigWig (main.bigWig, blue graph) and a bed (agg1.bed, red regions):
The files (wig and bed) would look like:
variableStep chrom=chr span=1
1 1.0
2 2.0
3 5.0
4 6.0
5 5.0
6 3.0
7 3.0
8 5.0
9 5.0
10 5.0
11 6.0
12 6.0
13 0.0
14 2.0
15 3.0
16 3.0
17 10.0
18 4.0
19 4.0
20 2.0
21 2.0
22 2.0
23 1.0
variableStep chrom=chr span=1
28 2.0
29 3.0
30 4.0
31 6.0
32 6.0
33 4.0
34 4.0
35 4.0
36 2.0
and
chr 0 4 R1
chr 9 19 R2
chr 28 35 R3
respectively. Aggregate will align the regions on their centers by default (below left), but also possible is to align the data on the start coordinates (below center) using the the -starts option, or on the end coordinates using the -ends option (below right).
As you can see, the size of the region is unimportant. What is important is how they are aligned and the size of the left:right aggregate specification. In the case the size of the bed region is odd, the center of the region is taken as the floor() of the calculated midpoint coordinate. In this way, the alignment is always between bases. In a simple case, one could ask for the 3 bases left and right of each of these centerings as follows:
Region/Values | Left | Right | Left (-starts) | Right (-starts) | Left (-ends) | Right (-ends) |
---|---|---|---|---|---|---|
R1 bases | -1,1,2 | 3,4,5 | -3,-2,-1 | 1,2,3 | 2,3,4 | 5,6,7 |
R1 vals | NA,1.0,2.0 | 5.0,6.0,5.0 | NA,NA,NA | 1.0,2.0,5.0 | 2.0,5.0,6.0 | 5.0,3.0,3.0 |
R2 bases | 12,13,14 | 15,16,17 | 6,7,8 | 9,10,11 | 17,18,19 | 20,21,22 |
R2 vals | 6.0,0.0,2.0 | 3.0,3.0,10.0 | 3.0,5.0,5.0 | 5.0,6.0,6.0 | 10.0,4.0,4.0 | 2.0,2.0,2.0 |
R3 bases | 29,30,31 | 32,33,34 | 26,27,28 | 29,30,31 | 33,34,35 | 36,37,38 |
R3 vals | 3.0,4.0,6.0 | 6.0,4.0,4.0 | NA,NA,2.0 | 3.0,4.0,6.0 | 4.0,4.0,4.0 | 2.0,NA,NA |
The aggregate calculation at its most basic, does an averaging. In this case, when running with each type of alignment:
Alignment | Base from center | Calculation | Result |
---|---|---|---|
default | -3 | (6.0 + 3.0) ÷ 2 | 4.5 |
default | -2 | (1.0 + 0.0 + 4.0) ÷ 3 | 1.66667 |
default | -1 | (2.0 + 2.0 + 6.0) ÷ 3 | 3.33333 |
default | 1 | (5.0 + 3.0 + 6.0) ÷ 3 | 4.66667 |
default | 2 | (6.0 + 3.0 + 4.0) ÷ 3 | 4.33333 |
default | 3 | (5.0 + 10.0 + 4.0) ÷ 3 | 6.33333 |
-starts | -3 | 3.0 ÷ 1 | 3.0 |
-starts | -2 | 5.0 ÷ 1 | 5.0 |
-starts | -1 | (2.0 + 5.0) ÷ 2 | 3.5 |
-starts | 1 | (1.0 + 5.0 + 3.0) ÷ 3 | 3.0 |
-starts | 2 | (2.0 + 6.0 + 4.0) ÷ 3 | 4.0 |
-starts | 3 | (5.0 + 6.0 + 6.0) ÷ 3 | 5.66667 |
-ends | -3 | (2.0 + 10.0 + 4.0) ÷ 3 | 5.33333 |
-ends | -2 | (5.0 + 4.0 + 4.0) ÷ 3 | 4.33333 |
-ends | -1 | (6.0 + 4.0 + 4.0) ÷ 3 | 4.66667 |
-ends | 1 | (5.0 + 2.0 + 2.0) ÷ 3 | 3.0 |
-ends | 2 | (3.0 + 2.0) ÷ 2 | 2.5 |
-ends | 3 | (3.0 + 2.0) ÷ 2 | 2.5 |
After running the program, we get:
$ bwtool agg 3:3 agg1.bed main.bigWig /dev/stdout
-3 4.500000
-2 1.666667
-1 3.333333
1 4.666667
2 4.333333
3 6.333333
bwtool agg 3:3 agg1.bed main.bigWig /dev/stdout -starts
-3 3.000000
-2 5.000000
-1 3.500000
1 3.000000
2 4.000000
3 5.666667
$ bwtool agg 3:3 agg1.bed main.bigWig /dev/stdout -ends
-3 5.333333
-2 4.333333
-1 4.666667
1 3.000000
2 2.500000
3 2.500000
From this point, it may be worthwhile to examine a few other options. -firstbase, does nothing really substantial other than center on a base instead of a base border. The output is almost identical to adding an additional base to the right part of the left:right specification, except the base distance numbers are slightly different, e.g.:
$ bwtool agg 3:4 agg1.bed main.bigWig /dev/stdout -ends
-3 5.333333
-2 4.333333
-1 4.666667
1 3.000000
2 2.500000
3 2.500000
4 3.000000
$ bwtool agg 3:3 agg1.bed main.bigWig /dev/stdout -ends -firstbase
-3 5.333333
-2 4.333333
-1 4.666667
0 3.000000
1 2.500000
2 2.500000
3 3.000000
-expanded prints three additional columns to the output. These are:
- Median.
- Standard deviation.
- Size (number of values used).
For example:
$ bwtool agg 3:3 agg1.bed main.bigWig /dev/stdout -expanded
-3 4.500000 4.500000 1.500000 2
-2 1.666667 1.000000 1.699673 3
-1 3.333333 2.000000 1.885618 3
1 4.666667 5.000000 1.247219 3
2 4.333333 4.000000 1.247219 3
3 6.333333 5.000000 2.624669 3