-
Notifications
You must be signed in to change notification settings - Fork 22
Andy Pohl edited this page Oct 29, 2013
·
4 revisions
The sax program runs the SAX discretization algorithm on bigWig data. In converts data from being a real-valued signal into being a character. The number of characters possible to use are between 2 and 20. Which character substituting a value depends on the overall distribution of the data. The point of doing this at all is to then potentially use methods and algorithms designed for DNA or protein sequence. I say potentially, because this is a very rough method, and proper motif-finding needs things like substitution matrices based on evolutionary models etc... and this doesn't really help with that. It just makes a conversion, for better or worse.
Usage for sax:
bwtool sax - Implementation of SAX algorithm on bigWig data region.
usage:
bwtool sax alphabet-size input.bw[:chr:start-end] output.sax
where:
alphabet-size is from 2-20
options:
-bed4 when set, disable the FASTA output in favor of BED4
-add-wig-out in the case of BED4 output, add an additional
column that shows the original data
-mean=val force z-normalization to use fixed mean
-std=val force z-normalization to use fixed standard
deviation