Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GUAVA pipeline stops at "Remove duplicates" #7

Open
jarleba opened this issue Nov 29, 2018 · 10 comments
Open

GUAVA pipeline stops at "Remove duplicates" #7

jarleba opened this issue Nov 29, 2018 · 10 comments

Comments

@jarleba
Copy link

jarleba commented Nov 29, 2018

Hi, everything seems to be working fine with my GUAVA now, but on a couple of samples the pipeline stops at "Remove duplicates". The program doesn't close or anything, it just says "Remove duplicates" forever, and my CPU aren't doing anything. As far as I can tell, the mapping went fine. Have anyone else experienced this?

@MayurDivate
Copy link
Owner

Hi @jarleba,

would you mind sharing the log file?
you should find it in the output folder.

Mayur

@jarleba
Copy link
Author

jarleba commented Nov 30, 2018

Here you go. :) It produces an unfinished result file as well.

E_C3_TKD181002942_H7NK7DSXX_L2_1_log.txt
E_C3_TKD181002942_H7NK7DSXX_L2_1_Result.xlsx

@MayurDivate
Copy link
Owner

@jarleba

I have checked the log file.
I could not find any error, I think it is taking much longer because you are analyzing 50+ million reads.
Plus, it is a 150bp sequencing data.

Try using multiple cores and extra RAM, and let it finish.

By the way, how long did you allow it to run?

If you don't mind, could you please share sample data with me via google drive?
so that I can also test GUAVA on 150bp data.

Thanks,
Mayur

@jarleba
Copy link
Author

jarleba commented Jan 2, 2019 via email

@MayurDivate
Copy link
Owner

@jarleba

I think you should wait until it finishes.
Just share a few thousand reads (sample data).

cheers,
Mayur

@jarleba
Copy link
Author

jarleba commented Jan 4, 2019

I let it run for a couple of days now, but I noticed that the CPU stops Processing when the pipeline is at «remove duplicates». It’s really strange.

I ran the analysis with a samples of just 10k reads. The whole analysis seems to work with just 10k reads. I will share the 10k.fq files aswell as the log files.

https://1drv.ms/f/s!AjzE0xKMj_y5g9Uko3gZQ1WrpdJDdA

Cheers,
Jarle

@RocWeng
Copy link

RocWeng commented Mar 18, 2019

Hi Mayur,

I also encountered the same problem!

It could completely run an analysis using your sample data, and also my "truncated data". The truncated data is generated by randomly extracting 1 million paired reads form the original data containing 45 million paired reads. However, when running the original large data GUAVA silently stopped at "Remove duplicates" step without any error message and the CPU usage showed zero %. And it is kept stopped for one day until I closed it. So, neither _aligned_duplicate_filtered.bam nor _aligned_duplicate_filtered.bam_matrix.txt were generated.

I used System Monitor tool on Ubuntu to check the condition of the stopped process, and it showed "xxx/miniconda2/bin/java -Xms512 -Xmx1g -jar xxx/minconda2/share/picard-2.18.7-0/picard.jar MarkDuplicates REMOVE_DUPLICATES=true I=xxx/xxx_aligned_csrt.bam O=xxx/xxx_aligned_duplicate_filtered.bam M=xxx/xxx_aligned_duplicate_filtered.bam_matrix.txt". (xxx is the file path or name)

Then I copy this command line and ran it independently. It took only 11.37 minutes to finish and correctly generate filtered.bam and matrix files.

My PC is HP Z640 workstation with two CPUs (total 24 threads) and 32G RAM.

You may like to download the 45 million paired reads for test.
https://1drv.ms/f/s!AoApRo91m_bCiypUpIz3FPKrtf57

Cheers,
Roc

@MayurDivate
Copy link
Owner

Hi Roc,

When we tested guava on big data from various papers it worked fine. However, when computer use to go sleep mode it used to stop processing data.

Please make sure that your computer does not go to sleep and automatically stops guava.

Thanks,
Mayur

@RocWeng
Copy link

RocWeng commented Mar 19, 2019

Hi Mayur,

I am sure the Automatic Suspend function on my computer is off. I also tried to move the mouse all the time during GUAVA running to prevent any chance for computer sleeping. But it still silently stopped at "Remove duplicates" step.

Would you mind to run my data to test it on your computer, or give me other's ATAC-seq .fastq data (> 45 million reads) to let me test it on my computer?

Thanks,
Roc

@prm123342
Copy link

I am having the same problem, has there been any recent fixes for this other than just waiting for the data for finish. My wall time has been 5 days so far. Same thing stops at removing duplicates and shows no memory use from this point onward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants