Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removal of both 5' and 3' adaptors in paired-end reads #831

Open
CorbinMach opened this issue Feb 5, 2025 · 2 comments
Open

Removal of both 5' and 3' adaptors in paired-end reads #831

CorbinMach opened this issue Feb 5, 2025 · 2 comments

Comments

@CorbinMach
Copy link

CorbinMach commented Feb 5, 2025

Hello,

I am very sorry if my question is wrong/incorrect, maybe someone can just point me in the right direction. I am using cutadapt for trimming some PCR primers from my sequencing results. Our samples look something like this:

CGTCCATAGCGCAAATCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCTTCACTGGGCTTGTCA

Using those primers you see, I add on Illumina sequencing adaptors, followed by Indexing/Universal Primers for sequencing. I then sequence using paired-end reads of 151 length. Correspondendly, I have 2 fastq files, containing the R1 and R2 read. I now want to cut the specific primers you see here from the R1 read, as well as using the paired-end trimming to make sure that each sequence got read twice during sequencing. I am using Cutadapt 4.3 on Ubuntu.

The primers I want to trim:
5': CGTCCATAGCGCAAATC
3': CTTCACTGGGCTTGTCA

The command I think I have to use:
cutadapt -g CGTCCATAGCGCAAATC -G TGACAAGCCCAGTGAAG -a CTTCACTGGGCTTGTCA -A GATTTGCGCTATGGACG --pair-filter=both -o 1.fastq -p 2.fastq Input1 Input2

For some reason, my output for R1 always looks like this:
CGTCCATAGCGCAAATCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

and output of R2 like this:
TGACAAGCCCAGTGAAGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

Even if I try to only do single-end trimming with one of the files, I still get these results (for example only input R1 with the same -a and -g flags as above). I also tried this:
cutadapt -a CTTCACTGGGCTTGTCA -A GATTTGCGCTATGGACG --pair-filter=both -o 1.fastq -p 2.fastq Input1 Input2

which yields a similar result as the command that contains the -g and -G flags. Can anyone helpe me solve this? I am confident I could technically solve this by just manually deleting the leading 17 nt of each read, but I feel like there must be a way to do this with cutadapt.

Best,
Corbin

@marcelm
Copy link
Owner

marcelm commented Feb 6, 2025

Hi, there’s a section in the documentation about this.

You don’t need the --pair-filter option. This is only relevant when you use an option that filters the reads such as --discard-untrimmed. And if you use --discard-untrimmed, you should not use --pair-filter=both, but leave it at the default (which is the same as --pair-filter=any) because you want the entire pair to be discarded if any of the two reads was untrimmed (that is, the pair is only kept if both primers were found).

Happy to help further if the above didn’t help, but please read that section first.

@CorbinMach
Copy link
Author

Thank you so much. I was only searching in the User guide. This (kind of) worked. For some reason, specifically my R2 reads have more errors in the PCR sequence, so they are not as cleanly removed as the R1 read primers. Still, I was able to remove most of them from both reads using -e 0.2 in addition to the method described in the link above.

Once again, thank you for linking me the correct dokumentation, this solved my problem (and finally stopped me from trimming each read individually, twice)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants