-
Notifications
You must be signed in to change notification settings - Fork 24
smudgeplot cutoff
Kamil S Jaroň edited this page Aug 8, 2019
·
3 revisions
This is a very simple program to guess appropriate cutoffs for extraction of the genomic kmers from kmer spectra.
The lower boundary (L) is estimated as the first local minima multiplied by coefficient 1.25 and rounded. The aim is to filter preferably all the erroneous kmers, but retaining as many heterozygous (1n) kmers from the genome.
The upper boundary is calculated as 0.998 quantile of kmers with coverage > 1.
These are proxies, if you know that your coverage is high, you can chose L higher, at the same time U does not have to be small at all, there is no harm in using practically all of them. Alternatively, you can chose L and U manually by glazing at the kmer histogram, check chosing L and U for details.
usage: smudgeplot.py cutoff [-h] infile boundary
Calculate meaningful values for lower/upper kmer histogram cutoff.
positional arguments:
infile Name of the input kmer histogram file (default "kmer.hist")."
boundary Which bounary to compute L (lower) or U (upper)
optional arguments:
-h, --help show this help message and exit