Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mOTUs4 finds an incorrect mismatch in the paired reads input #132

Open
jun-yeong-heo opened this issue Jan 14, 2025 · 2 comments
Open

mOTUs4 finds an incorrect mismatch in the paired reads input #132

jun-yeong-heo opened this issue Jan 14, 2025 · 2 comments
Assignees

Comments

@jun-yeong-heo
Copy link

Dear mOTUs developers,

Please understand that I'm posting here because I couldn't find anywhere to ask questions about mOTUs4.

I read the paper by mOTUs4 recently and thought it was great.
So I'm trying to apply mOTUs4 to my data, but I'm having trouble at the beginning, so I'm contacting here.

I'm using kneaddata to perform host removal and QC trimming.
But when I input the paired reads after kenaddata into mOTUs4, mOTUs4 found the wrong differing read headers and the process got stuck.
I checked for the differing read headers pointed out by mOTUs4 in the input reads, and they were all there in both the forward and reverse reads.
I tried to fix th reads with BBMap, but there was no unmatched headers between the reads.

I think if there's something wrong when mOTUs4 receive the paired reads or if I'm missing something.
Could you give me some advice?
I'm using mOTUs4 in conda environment, and the code I used is as follows:


mkdir -p ./motus_test

INPUT_DIR="/bio/home/hjy/data/Shotgun/kneaddata_out/sample_001"

python $MOTUS4_HOME/motus.py profile \
-f ${INPUT_DIR}/sample_001_1_kneaddata_paired_1.fastq \
-r ${INPUT_DIR}/sample_001_1_kneaddata_paired_2.fastq \
-n sample_001_TEST \
-o ./motus_test \
-t 12

I am attatching error message mOTUs4 reported since differing read headers at my data.
motus4_test.log

I'm not sure how mOTUs works internally, so this may be a stupid question, but I'd appreciate your understanding.

Best regards,
Jun-Yeong

@hjruscheweyh
Copy link
Member

Hi @jun-yeong-heo

Thank you for using mOTUs. This is the right place to post questions.

mOTUs will check that the names of headers are identical between the 2 paired-end files in order to later correctly count on insert level. It seems that mOTUs complains about your reads.

Internally, mOTUs gets the names of the first 1000 reads of R1 and R2 and checks if they re identical. In your case it complains about 176 read headers that dont seem to match.

Could you just send me the first few reads of your dataset please (see command below)?

cat ${INPUT_DIR}/sample_001_1_kneaddata_paired_1.fastq | paste - - - - | cut -f 1 | head -n 100
cat ${INPUT_DIR}/sample_001_1_kneaddata_paired_2.fastq | paste - - - - | cut -f 1 | head -n 100

Thanks,
Hans

@hjruscheweyh hjruscheweyh self-assigned this Jan 14, 2025
@jun-yeong-heo
Copy link
Author

Dear @hjruscheweyh

Thank you for your quick reply.
Your advice solved my problem perfectly.

The problem was that kneaddata kept the result paired reads in an unsorted, which caused a mismatch when mOTUs read the first 1000 reads.
After sorting the reads using BBMap, my jobs were completed perfectly.

Many thanks,
Jun-Yeong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants