-
Notifications
You must be signed in to change notification settings - Fork 22
extract
Andy Pohl edited this page Dec 7, 2013
·
4 revisions
The extract program provides ways to extract data from the bigWig apart from the ways already seen in the matrix, window, and paste programs. The first way: bed, can return comma-separated values from given bed regions of arbitrary length. The other way: jsp, accommodates custom programs and perhaps isn't of general interest but similar to paste is outputted in a vertical format. The usage:
bwtool extract - extract data from the bigWig in other ways than matrix, paste, or window
usage:
bwtool extract <style> regions.bed in.bw out.txt
where "style" is one of:
bed - this will do something similar to bwtool matrix without the left:right specif-
ication and data only coming from the defined bed region, meaning region sizes
are also allowed to be variably-sized. If a six-field bed is given and the
region is on the minus strand, then the extracted data is reversed prior to
outputting. The output format is the original bed up to the first six fields,
tab-delimited, followed by a field indicating the length of the data to follow,
followed by the data, separated by commas (or tabs if option -tabs is used).
jsp - with similar effect as "bed" in terms of stranded bed input and reversing data
or not, the output is a bit more minimal and has a vertical structure simlar to
bwtool paste. Values are separated line-by-line and regions are preceded by a
line starting with # and stating the name of the region from the bed. If only
three fields are used in the bed or bed name is ".", then the region is numbered.
options:
-tabs output tabs instead of commas in output.
-locus-name in jsp output, output the region locus in genome browser coordinate
form (i.e. chrom:(chromStart+1)-chromEnd instead of the bed name
Reusing the example from aggregate:
there's a bed that looks like:
chr 0 4 R1
chr 9 19 R2
chr 28 35 R3
Running the program with the bed extraction style will give a file like:
$ bwtool extract bed beds/agg1.bed main.bw /dev/stdout
chr 0 4 R1 4 1.00,2.00,5.00,6.00
chr 9 19 R2 10 5.00,6.00,6.00,0.00,2.00,3.00,3.00,10.00,4.00,4.00
chr 28 35 R3 7 3.00,4.00,6.00,6.00,4.00,4.00,4.00
The bed is repeated in the output, while including two new fields: (1) the number of datapoints extracted, and (2) the datapoints themselves.