Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error correct PE #346

Open
OrsonMM opened this issue Aug 2, 2024 · 0 comments
Open

Error correct PE #346

OrsonMM opened this issue Aug 2, 2024 · 0 comments

Comments

@OrsonMM
Copy link

OrsonMM commented Aug 2, 2024

Dear team masurca.

I am using masurca for assembly a genome of Guinea Pig 🐹 ; the Genome size reported in NCBI is 3.1 GB.
Genome sequencing PE, 150 insert size.

In the firts run I had the error with JF_SIZE

Currently Loaded Modulefiles:
 1) python/3.9.4   2) SPAdes/3.15.3   3) masurca/4.1.1
Started at Thu Aug  1 15:02:23 -05 2024
Verifying PATHS...
jellyfish OK
runCA OK
createSuperReadsForDirectory.perl OK
SOAPdenovo-63mer OK
creating script file for the actions...done.
execute assemble.sh to run assembly
[Thu Aug  1 15:02:24 -05 2024] Processing pe library reads
[Thu Aug  1 15:23:34 -05 2024] Average PE read length 109
[Thu Aug  1 15:23:34 -05 2024] Using kmer size of 65 for the graph
[Thu Aug  1 15:23:34 -05 2024] MIN_Q_CHAR: 33
WARNING: JF_SIZE set too low, increasing JF_SIZE to at least 1059107674, this automatic increase may be not enough!
[Thu Aug  1 15:23:34 -05 2024] Creating mer database for Quorum
[Thu Aug  1 15:39:59 -05 2024] Error correct PE
[Thu Aug  1 16:05:42 -05 2024] Estimating genome size
slurmstepd: error: *** JOB 688 ON n001 CANCELLED AT 2024-08-01T16:21:14 ***
[Thu Aug  1 16:21:14 -05 2024] Interrupted

Please, you can explain what is the metrics of JF_SIZE in the guide mention (jellyfish hash size, set this to about 10x the genome size)

but is in Mb units ??? is my question.

In the second run, I modified the JF_SIZE and the run all the proccess but report the Error correct PE

Currently Loaded Modulefiles:
 1) masurca/4.1.1
Started at Thu Aug  1 16:45:05 -05 2024
Verifying PATHS...
jellyfish OK
runCA OK
createSuperReadsForDirectory.perl OK
SOAPdenovo-63mer OK
creating script file for the actions...done.
execute assemble.sh to run assembly
[Thu Aug  1 16:45:05 -05 2024] Processing pe library reads
[Thu Aug  1 16:53:07 -05 2024] Average PE read length 109
[Thu Aug  1 16:53:08 -05 2024] Using kmer size of 65 for the graph
[Thu Aug  1 16:53:08 -05 2024] MIN_Q_CHAR: 33
[Thu Aug  1 16:53:08 -05 2024] Creating mer database for Quorum
[Thu Aug  1 17:40:14 -05 2024] Error correct PE
[Thu Aug  1 18:55:39 -05 2024] Estimating genome size
[Thu Aug  1 19:10:04 -05 2024] Estimated genome size: 2258330106
[Thu Aug  1 19:10:04 -05 2024] Creating k-unitigs with k=65
[Thu Aug  1 19:41:05 -05 2024] Computing super reads from PE
[Thu Aug  1 20:24:14 -05 2024] SOAPdenovo
[Fri Aug  2 03:06:28 -05 2024] Gap closing
[Fri Aug  2 03:36:38 -05 2024] Removing duplicated contained contigs
[Fri Aug  2 03:38:37 -05 2024] Rescaffolding
All contigs loaded.
[Fri Aug  2 04:58:12 -05 2024] Assembly success. Output sequence is in SOAP_assembly/asm2.scafSeq2
Finished at Fri Aug  2 04:58:12 -05 2024

The Stats of these results are very bad,
prueba01_asm2.txt

PD: file of configuration MASURCA
configuration_02.txt

MIN_Q_CHAR=33 by default ... whats mean these variable.

Please, any recommendation to eliminate the error. 🐛🐛🐛

Thanks-

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant