You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a large sequencing library that I needed to split into 10 smaller files so I could run FastK on different nodes. Following the instructions in the README, I ran FastK on the split files with the following command:
I noticed that there are no k-mers with a count lower than 4 in the merged library histogram. I repeated the process a few times, combining different files, and the merged histograms consistently lack smaller k-mer counts (i.e., they start at 4 or 5). I’m unsure if this behavior is expected, as I do not understand why there are no single-occurrence k-mers. Is this a bug, or am I misunderstanding or misusing the tool?
Thanks for your assistance!
The text was updated successfully, but these errors were encountered:
I suspect this will be because the -t option filters out k-mers with coverage under certain threshold, by default it's 1. From README:
One can optionally request, by specifying the ‑t option, that FastK produce a sorted table of all canonical k‑mers along with their counts. If an integer follows then only those k‑mers that occur ‑t or more times where the default threshold is 1. In those applications where low count k‑mers are not needed this can save significant time and space as most such k‑mers are error‑mers.
So, while the histogram on individual databases will be right (histogram is stored in its entirety), the Fastmerge is not able to merge them, you need the table of k-mer/count pairs, which is affected by -t.
The solution would be to make only a single database (it will not use a lot more memory and compute will scale very reasonably, you will just need more disk space, for such large genome it could be more than a TB, but the compute really should not take all that long and you should be able to free the space afterwards.).
Dear Gene,
I have a large sequencing library that I needed to split into 10 smaller files so I could run
FastK
on different nodes. Following the instructions in the README, I ranFastK
on the split files with the following command:This produced a
*.hist
and a*.ktab
file for each*.split.fastq
file. I looked at the k-mer count histogram for each split file:Histex -G library_split_01.hist > library_split_01.histogram
I then merged the split files using
Fastmerge
, and generated histograms for the merged k-mer database:I noticed that there are no k-mers with a count lower than 4 in the merged library histogram. I repeated the process a few times, combining different files, and the merged histograms consistently lack smaller k-mer counts (i.e., they start at 4 or 5). I’m unsure if this behavior is expected, as I do not understand why there are no single-occurrence k-mers. Is this a bug, or am I misunderstanding or misusing the tool?
Thanks for your assistance!
The text was updated successfully, but these errors were encountered: