Skip to content
Andy Pohl edited this page Dec 7, 2013 · 4 revisions

The extract program provides ways to extract data from the bigWig apart from the ways already seen in the matrix, window, and paste programs. The first way: bed, can return comma-separated values from given bed regions of arbitrary length. The other way: jsp, accommodates custom programs and perhaps isn't of general interest but similar to paste is outputted in a vertical format. The usage:

bwtool extract - extract data from the bigWig in other ways than matrix, paste, or window
usage:
   bwtool extract <style> regions.bed in.bw out.txt
where "style" is one of:
   bed  - this will do something similar to bwtool matrix without the left:right specif-
          ication and data only coming from the defined bed region, meaning region sizes
          are also allowed to be variably-sized.  If a six-field bed is given and the
          region is on the minus strand, then the extracted data is reversed prior to
          outputting. The output format is the original bed up to the first six fields,
          tab-delimited, followed by a field indicating the length of the data to follow,
          followed by the data, separated by commas (or tabs if option -tabs is used).
   jsp  - with similar effect as "bed" in terms of stranded bed input and reversing data
          or not, the output is a bit more minimal and has a vertical structure simlar to
          bwtool paste.  Values are separated line-by-line and regions are preceded by a
          line starting with # and stating the name of the region from the bed. If only
          three fields are used in the bed or bed name is ".", then the region is numbered.
options:
   -tabs         output tabs instead of commas in output.
   -locus-name   in jsp output, output the region locus in genome browser coordinate
                 form (i.e. chrom:(chromStart+1)-chromEnd instead of the bed name

Examples

Reusing the example from aggregate:

there's a bed that looks like:

chr	0	4	R1
chr	9	19	R2
chr	28	35	R3

Running the program with the bed extraction style will give a file like:

$ bwtool extract bed beds/agg1.bed main.bw /dev/stdout
chr	0	4	R1	4	1.00,2.00,5.00,6.00
chr	9	19	R2	10	5.00,6.00,6.00,0.00,2.00,3.00,3.00,10.00,4.00,4.00
chr	28	35	R3	7	3.00,4.00,6.00,6.00,4.00,4.00,4.00

The bed is repeated in the output, while including two new fields: (1) the number of datapoints extracted, and (2) the datapoints themselves.

Clone this wiki locally