Skip to content

Effect of kmer size

Ryan Wick edited this page Sep 7, 2015 · 3 revisions

The structure of an assembly graph is highly dependant on the k-mer size used for assembly. Small k-mers result in shorter contigs but with lots of connections, while large k-mers can result in longer contigs but with fewer connections.

When assembling 100 bp reads in Velvet, a k-mer of 61 would be a good starting point, and then adjust up or down as needed. SPAdes conducts assembly multiple times using different k-mers, so you can look at the FASTG files for each assembly (in folders named like K21, K33, etc.) to find the best graph for viewing in Bandage.

If your graph consists of many separate disconnected subgraphs (i.e. there are many small groups of contigs that have no connections to the rest of the graph), then your k-mer size may be too large. Alternatively, if your graph is connected (i.e. all contigs are tied together in a single graph structure) but is very dense and tangled, then your k-mer size may be too small.

Example

For this example I assembled a Salmonella genome from 100 bp Illumina reads. Which graph is best depends on your priorities and which sequences you are interested in, though the 61-mer and 71-mer graphs are both pretty good.

51-mer assembly

This k-mer size is too small, resulting in a complex and tangled graph with 4618 nodes and 6070 edges.

51-mer assembly graph

61-mer assembly

This graph is better than the 51-mer graph – it is much less complex (1357 nodes and 1768 edges) but still has very few dead ends.

61-mer assembly graph

71-mer assembly

While the complexity of the graph has improved (it has 611 nodes and 765 edges), it now shows many more dead ends.

71-mer assembly graph

81-mer assembly

As compared to the 71-mer graph, the complexity has slightly improved (it has 490 nodes and 512 edges), but it has broken into many disconnected parts.

81-mer assembly graph

91-mer assembly

This graph has 2386 nodes and 304 edges and mostly consists of disconnected nodes. This k-mer size is definitely too large.

91-mer assembly graph