GUAVA pipeline stops at "Remove duplicates" #7

jarleba · 2018-11-29T20:56:45Z

Hi, everything seems to be working fine with my GUAVA now, but on a couple of samples the pipeline stops at "Remove duplicates". The program doesn't close or anything, it just says "Remove duplicates" forever, and my CPU aren't doing anything. As far as I can tell, the mapping went fine. Have anyone else experienced this?

MayurDivate · 2018-11-29T23:14:05Z

Hi @jarleba,

would you mind sharing the log file?
you should find it in the output folder.

Mayur

jarleba · 2018-11-30T08:45:40Z

Here you go. :) It produces an unfinished result file as well.

E_C3_TKD181002942_H7NK7DSXX_L2_1_log.txt
E_C3_TKD181002942_H7NK7DSXX_L2_1_Result.xlsx

MayurDivate · 2019-01-02T00:24:35Z

@jarleba

I have checked the log file.
I could not find any error, I think it is taking much longer because you are analyzing 50+ million reads.
Plus, it is a 150bp sequencing data.

Try using multiple cores and extra RAM, and let it finish.

By the way, how long did you allow it to run?

If you don't mind, could you please share sample data with me via google drive?
so that I can also test GUAVA on 150bp data.

Thanks,
Mayur

jarleba · 2019-01-02T07:15:17Z

Hi Mayur, The other replicates that worked finished after about 6 hours, using 8 cores and 32 GB RAM. I ran the ones that did not finish for a few more hours, but I don’t remember exactly. I will see if I can share the fastq files on google drive. :) Thanks Fra: Mayur Divate<mailto:[email protected]> Sendt: onsdag 2. januar 2019 kl. 01:24 Til: MayurDivate/GUAVASourceCode<mailto:[email protected]> Kopi: jarleba<mailto:[email protected]>; Mention<mailto:[email protected]> Emne: Re: [MayurDivate/GUAVASourceCode] GUAVA pipeline stops at "Remove duplicates" (#7) @jarleba<https://github.com/jarleba> I have checked the log file. I could not find any error, I think it is taking much longer because you are analyzing 50+ million reads. Plus, it is a 150bp sequencing data. Try using multiple cores and extra RAM, and let it finish. By the way, how long did you allow it to run? If you don't mind, could you please share sample data with me via google drive? so that I can also test GUAVA on 150bp data. Thanks, Mayur — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#7 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ArNdnDMES6vthtU3juFbuMoUpT_vftNiks5u-_xDgaJpZM4Y6f9B>.

MayurDivate · 2019-01-02T07:53:12Z

@jarleba

I think you should wait until it finishes.
Just share a few thousand reads (sample data).

cheers,
Mayur

jarleba · 2019-01-04T17:03:06Z

I let it run for a couple of days now, but I noticed that the CPU stops Processing when the pipeline is at «remove duplicates». It’s really strange.

I ran the analysis with a samples of just 10k reads. The whole analysis seems to work with just 10k reads. I will share the 10k.fq files aswell as the log files.

https://1drv.ms/f/s!AjzE0xKMj_y5g9Uko3gZQ1WrpdJDdA

Cheers,
Jarle

RocWeng · 2019-03-18T03:33:25Z

Hi Mayur,

I also encountered the same problem!

It could completely run an analysis using your sample data, and also my "truncated data". The truncated data is generated by randomly extracting 1 million paired reads form the original data containing 45 million paired reads. However, when running the original large data GUAVA silently stopped at "Remove duplicates" step without any error message and the CPU usage showed zero %. And it is kept stopped for one day until I closed it. So, neither _aligned_duplicate_filtered.bam nor _aligned_duplicate_filtered.bam_matrix.txt were generated.

I used System Monitor tool on Ubuntu to check the condition of the stopped process, and it showed "xxx/miniconda2/bin/java -Xms512 -Xmx1g -jar xxx/minconda2/share/picard-2.18.7-0/picard.jar MarkDuplicates REMOVE_DUPLICATES=true I=xxx/xxx_aligned_csrt.bam O=xxx/xxx_aligned_duplicate_filtered.bam M=xxx/xxx_aligned_duplicate_filtered.bam_matrix.txt". (xxx is the file path or name)

Then I copy this command line and ran it independently. It took only 11.37 minutes to finish and correctly generate filtered.bam and matrix files.

My PC is HP Z640 workstation with two CPUs (total 24 threads) and 32G RAM.

You may like to download the 45 million paired reads for test.
https://1drv.ms/f/s!AoApRo91m_bCiypUpIz3FPKrtf57

Cheers,
Roc

MayurDivate · 2019-03-18T07:16:54Z

Hi Roc,

When we tested guava on big data from various papers it worked fine. However, when computer use to go sleep mode it used to stop processing data.

Please make sure that your computer does not go to sleep and automatically stops guava.

Thanks,
Mayur

RocWeng · 2019-03-19T02:07:22Z

Hi Mayur,

I am sure the Automatic Suspend function on my computer is off. I also tried to move the mouse all the time during GUAVA running to prevent any chance for computer sleeping. But it still silently stopped at "Remove duplicates" step.

Would you mind to run my data to test it on your computer, or give me other's ATAC-seq .fastq data (> 45 million reads) to let me test it on my computer?

Thanks,
Roc

prm123342 · 2022-08-12T06:18:06Z

I am having the same problem, has there been any recent fixes for this other than just waiting for the data for finish. My wall time has been 5 days so far. Same thing stops at removing duplicates and shows no memory use from this point onward.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GUAVA pipeline stops at "Remove duplicates" #7

GUAVA pipeline stops at "Remove duplicates" #7

jarleba commented Nov 29, 2018 •

edited

Loading

MayurDivate commented Nov 29, 2018

jarleba commented Nov 30, 2018 •

edited

Loading

MayurDivate commented Jan 2, 2019

jarleba commented Jan 2, 2019 via email

MayurDivate commented Jan 2, 2019

jarleba commented Jan 4, 2019

RocWeng commented Mar 18, 2019

MayurDivate commented Mar 18, 2019

RocWeng commented Mar 19, 2019

prm123342 commented Aug 12, 2022

GUAVA pipeline stops at "Remove duplicates" #7

GUAVA pipeline stops at "Remove duplicates" #7

Comments

jarleba commented Nov 29, 2018 • edited Loading

MayurDivate commented Nov 29, 2018

jarleba commented Nov 30, 2018 • edited Loading

MayurDivate commented Jan 2, 2019

jarleba commented Jan 2, 2019 via email

MayurDivate commented Jan 2, 2019

jarleba commented Jan 4, 2019

RocWeng commented Mar 18, 2019

MayurDivate commented Mar 18, 2019

RocWeng commented Mar 19, 2019

prm123342 commented Aug 12, 2022

jarleba commented Nov 29, 2018 •

edited

Loading

jarleba commented Nov 30, 2018 •

edited

Loading