-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NextSeq Quality Trimming Issue #758
Comments
Hi, I think the report looks fine. First, note that the FastQC chart is somewhat misleading when you run FastQC on reads that have variable lengths – such as those that you get after quality trimming or adapter removal. The reason is that you cannot see how many bases each of the boxplots is based on. For example, the statistics for lengths 140-143 are very likely based on a lower number of reads. Second, the quality trimming algorithm that Cutadapt uses sometimes does not trim as much as a human would when looking at the quality values. In particular, when the last (most 3') base of the read has a quality value above the threshold, the read is not trimmed at all no matter how low the quality values are that come before it. You may see some of these cases in that chart. I think the whisker going below the trimming threshold can be explained by the box plots showing merged statistics for multiple read lengths. For example, if one read of length 143 has a high-quality base at the end that is preceded by a low-quality one, it doesn’t get trimmed, but both the low- and high-quality base end up in the same boxplot for lengths 140-143. |
Thank you so much for this feedback and explanation! It makes sense that the lower whisker would fall below the trimming threshold when a there are a variety of read lengths. Thanks for making the time to clarify this for me! Very much appreciated. |
Sure, no problem! Closing this now, feel free to comment or open a new issue if there are further questions. |
Hello,
I am using cutadapt to trim barcodes and filter reads (based on length and quality) from NextSeq1000 data. I am using cutadapt v4.5, which is currently installed as a conda environment. Below is the command line that I used:
The output is attached with the summary posted below:
It looks like some reads were quality trimmed. When I process the trimmed datasets using fastqc, I see that the blue mean value is well above my Q20 threshold, however, the lower whiskers extend below Q20, which I wasn't expecting. See attached doc with fastqc per base sequence quality plots. This might be a simple misinterpretation on my end but wanted to ask. Thanks in advance!!!
Fastqc_charts.docx
cutadapt_output_summary.txt
The text was updated successfully, but these errors were encountered: