Skip to content
Kamil S Jaroň edited this page Aug 8, 2019 · 3 revisions

This is a very simple program to guess appropriate cutoffs for extraction of the genomic kmers from kmer spectra.

The lower boundary (L) is estimated as the first local minima multiplied by coefficient 1.25 and rounded. The aim is to filter preferably all the erroneous kmers, but retaining as many heterozygous (1n) kmers from the genome.

The upper boundary is calculated as 0.998 quantile of kmers with coverage > 1.

These are proxies, if you know that your coverage is high, you can chose L higher, at the same time U does not have to be small at all, there is no harm in using practically all of them. Alternatively, you can chose L and U manually by glazing at the kmer histogram, check chosing L and U for details.

Usage

usage: smudgeplot.py cutoff [-h] infile boundary

Calculate meaningful values for lower/upper kmer histogram cutoff.

positional arguments:
  infile      Name of the input kmer histogram file (default "kmer.hist")."
  boundary    Which bounary to compute L (lower) or U (upper)

optional arguments:
  -h, --help  show this help message and exit