Skip to content
Andy Pohl edited this page Oct 29, 2013 · 4 revisions

The sax program runs the SAX discretization algorithm on bigWig data. In converts data from being a real-valued signal into being a character. The number of characters possible to use are between 2 and 20. Which character substituting a value depends on the overall distribution of the data. The point of doing this at all is to then potentially use methods and algorithms designed for DNA or protein sequence. I say potentially, because this is a very rough method, and proper motif-finding needs things like substitution matrices based on evolutionary models etc... and this doesn't really help with that. It just makes a conversion, for better or worse.

Usage for sax:

bwtool sax - Implementation of SAX algorithm on bigWig data region.
usage:
   bwtool sax alphabet-size input.bw[:chr:start-end] output.sax
where:
   alphabet-size is from 2-20
options:
   -bed4                  when set, disable the FASTA output in favor of BED4
   -add-wig-out           in the case of BED4 output, add an additional
                          column that shows the original data
   -mean=val              force z-normalization to use fixed mean
   -std=val               force z-normalization to use fixed standard
                          deviation
Clone this wiki locally