-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report incorrectly reporting compressed read length rather than raw #1906
Comments
I'm not sure what version of Canu you're running but the 2.1.1 release does not do any trimming for -pacbio-hifi data by default. I suggest updating to that if you have not. If you have, what is the full command you're using? |
The version I used is this one: And the command I used is: Javier |
That won't do trimming, what part of the report you are referring to as trimming reads? I expect you're looking at reads in homopolymer compressed space, not trimmed reads. |
Yes, I was referring the the homopolymer compressed space, my mistake. As far as I understand this is basically reducing (or compressing) all homopolymers, right? But now the question is, how this is affecting the assembly process? and/or if this compression is critical so it controls for sequencing errors but sacrificing what may be actual genome sequence. I may be misunderstanding this but if you can clarify, it would greatly appreciated. This is the section of the
Thanks Javier |
Ah yes, that is just the report of the reads being loaded. It should report something like:
so you know the reads haven't been trimmed. It looks like it is reporting compressed reads which are about 70% of the length of the original reads. This looks like a bug in the tip version, not present in the 2.1.1 release. It shouldn't show the compressed lengths in the report as this is confusing to users. This shouldn't affect the assembly, the reads will get uncompressed after consensus. However, unless you encounter a specific bug in 2.1.1 I'd recommend using the verified release version for a production assembly. I'll leave this open though so we fix the incorrect report. |
Thank you for the clarification! I'll use the verified release then. Best |
Fixed! To confirm, it was just a reporting issue. |
Dear developers,
I have HiFi reads of a Drosophila genome with an average read length of ~16 kb and ran HiCanu
-pacbio-hifi
to assemble the genome. But I noticed that the average median size of my reads was ~11 kb after trimming.Is this striking reduction on length typical? or is there a way to tweak the filtering parameters to make canu more permissive?
Thank you!
Javier
The text was updated successfully, but these errors were encountered: