Error correct PE #346

OrsonMM · 2024-08-02T16:37:06Z

Dear team masurca.

I am using masurca for assembly a genome of Guinea Pig 🐹 ; the Genome size reported in NCBI is 3.1 GB.
Genome sequencing PE, 150 insert size.

In the firts run I had the error with JF_SIZE

Currently Loaded Modulefiles:
 1) python/3.9.4   2) SPAdes/3.15.3   3) masurca/4.1.1
Started at Thu Aug  1 15:02:23 -05 2024
Verifying PATHS...
jellyfish OK
runCA OK
createSuperReadsForDirectory.perl OK
SOAPdenovo-63mer OK
creating script file for the actions...done.
execute assemble.sh to run assembly
[Thu Aug  1 15:02:24 -05 2024] Processing pe library reads
[Thu Aug  1 15:23:34 -05 2024] Average PE read length 109
[Thu Aug  1 15:23:34 -05 2024] Using kmer size of 65 for the graph
[Thu Aug  1 15:23:34 -05 2024] MIN_Q_CHAR: 33
WARNING: JF_SIZE set too low, increasing JF_SIZE to at least 1059107674, this automatic increase may be not enough!
[Thu Aug  1 15:23:34 -05 2024] Creating mer database for Quorum
[Thu Aug  1 15:39:59 -05 2024] Error correct PE
[Thu Aug  1 16:05:42 -05 2024] Estimating genome size
slurmstepd: error: *** JOB 688 ON n001 CANCELLED AT 2024-08-01T16:21:14 ***
[Thu Aug  1 16:21:14 -05 2024] Interrupted

Please, you can explain what is the metrics of JF_SIZE in the guide mention (jellyfish hash size, set this to about 10x the genome size)

but is in Mb units ??? is my question.

In the second run, I modified the JF_SIZE and the run all the proccess but report the Error correct PE

Currently Loaded Modulefiles:
 1) masurca/4.1.1
Started at Thu Aug  1 16:45:05 -05 2024
Verifying PATHS...
jellyfish OK
runCA OK
createSuperReadsForDirectory.perl OK
SOAPdenovo-63mer OK
creating script file for the actions...done.
execute assemble.sh to run assembly
[Thu Aug  1 16:45:05 -05 2024] Processing pe library reads
[Thu Aug  1 16:53:07 -05 2024] Average PE read length 109
[Thu Aug  1 16:53:08 -05 2024] Using kmer size of 65 for the graph
[Thu Aug  1 16:53:08 -05 2024] MIN_Q_CHAR: 33
[Thu Aug  1 16:53:08 -05 2024] Creating mer database for Quorum
[Thu Aug  1 17:40:14 -05 2024] Error correct PE
[Thu Aug  1 18:55:39 -05 2024] Estimating genome size
[Thu Aug  1 19:10:04 -05 2024] Estimated genome size: 2258330106
[Thu Aug  1 19:10:04 -05 2024] Creating k-unitigs with k=65
[Thu Aug  1 19:41:05 -05 2024] Computing super reads from PE
[Thu Aug  1 20:24:14 -05 2024] SOAPdenovo
[Fri Aug  2 03:06:28 -05 2024] Gap closing
[Fri Aug  2 03:36:38 -05 2024] Removing duplicated contained contigs
[Fri Aug  2 03:38:37 -05 2024] Rescaffolding
All contigs loaded.
[Fri Aug  2 04:58:12 -05 2024] Assembly success. Output sequence is in SOAP_assembly/asm2.scafSeq2
Finished at Fri Aug  2 04:58:12 -05 2024

The Stats of these results are very bad,
prueba01_asm2.txt

PD: file of configuration MASURCA
configuration_02.txt

MIN_Q_CHAR=33 by default ... whats mean these variable.

Please, any recommendation to eliminate the error. 🐛🐛🐛

Thanks-

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error correct PE #346

Error correct PE #346

OrsonMM commented Aug 2, 2024 •

edited

Loading

Error correct PE #346

Error correct PE #346

Comments

OrsonMM commented Aug 2, 2024 • edited Loading

OrsonMM commented Aug 2, 2024 •

edited

Loading