You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Given the distribution of kmer counts, the cutoff should be around 4-6, but ska outputs a value of 17. I think this high treeshold is related to the "Error" message below obtained for low kmer sizes during the model fittng.
Also, would it be possible to perform the calculation of the estimated cutoff 'on the fly' while extracting the kmers from fastq files? As it is, we have to run ska twice (first to estimate the cutoff and then to build the kmer dictionary), which kind of defeat the purpose (twice the runtimes).
Thanks and best,
Romain
The text was updated successfully, but these errors were encountered:
Thanks for the testing and files. The cov.txt file is actually fine, the coverage/error column is actually just to assist with the plotting.
It looks like the default mixture model just doesn't fit this data that well. I'll have a play around with some other types of mixture for the second component to see if we can offer an alternative in these cases.
I'll also think about adding an auto option rather than running ska cov separately, that might be easier after fixing #45. Of course you do end up counting k-mers twice this way if you run on every read set, but it's still only about ~60s per sample.
johnlees
changed the title
Error with ska cov, and possible improvement
Feature requests: different mixture model; combined ska build & cov
Jul 20, 2023
Hi John,
Thanks a lot for implementing the calculation of the optimal kmer count threshold (ska cov).
I tested it using ska 0.3.1 and a paired-end sample (fastq files available here: https://drive.google.com/drive/folders/1HVO-6mOd7bh7CPOjXhA3lWAT0GA_8SC8?usp=sharing). My command line was:
Given the distribution of kmer counts, the cutoff should be around 4-6, but ska outputs a value of 17. I think this high treeshold is related to the "Error" message below obtained for low kmer sizes during the model fittng.
This is the beginning of the file 'cov.txt':
Also, would it be possible to perform the calculation of the estimated cutoff 'on the fly' while extracting the kmers from fastq files? As it is, we have to run ska twice (first to estimate the cutoff and then to build the kmer dictionary), which kind of defeat the purpose (twice the runtimes).
Thanks and best,
Romain
The text was updated successfully, but these errors were encountered: