Seg-fault during index formation #3

rwhetten · 2022-09-12T01:12:57Z

I cloned the repo and compiled the code, but I get a segmentation fault when trying to index a fragmented genome with 1.75 million scaffolds. The executable works fine to make an index of GRCh38 (including all alternate scaffolds, so 63 Gb total), so it doesn't appear to be the software itself.
Is there a limit on the number of scaffolds in an assembly for indexing? Alternatively, are there characters that might cause problems if present in scaffold names?

lh3 · 2022-09-12T02:06:44Z

Please try the latest version from github HEAD. There was a bug, though I am not sure if that would lead to segfault.

rwhetten · 2022-09-12T12:56:18Z

I used git pull, make clean, and make; then tried the index building job again. It ran for longer this time, and wrote the following to stderr:
[M::[email protected]*0.99] read 22104357184 bases in 1755249 contigs
[M::[email protected]*0.99] 174414660 blocks
[M::[email protected]*14.65] collected syncmers
/var/spool/slurm/slurmd/job5292490/slurm_script: line 22: 777006 Segmentation fault
The command used was ~/miniprot -t16 -d $INDEX $GENOME; RAM use reached 100 Gb and runtime 21 minutes.

lh3 · 2022-09-12T17:46:25Z

One potential cause is memory. The Ensembl version of GRCh38 has many ambiguous bases. Although the total contig length is 63 Gb, there are only ~3.2 Gb actual sequences. Your assembly is 7 times larger. I guess it will take 120-150 GB of memory for indexing.

rwhetten · 2022-09-12T20:12:03Z

The node that was running the job had 370 Gb RAM allocated, and the output doesn't indicate an out-of-memory error in any way I recognize. The exit code was 139, and RAM use peaked at 100.5 Gb. Would non-alphanumeric, non-underscore characters (such as space or dot) in scaffold names be a problem?
Thinking of work-arounds - is there any way to merge indexes of genome subsets into a single index after they are created? I could split the genome into 8 subsets and index them separately. If indexes can't be joined, I could align them separately, with the loss of some information.

lh3 · 2022-09-22T21:53:22Z

The segmentation fault should be caused by #4, which has been fixed. Let me know if you still have the problem. I am closing this issue for now.

twrightsman mentioned this issue Sep 22, 2022

Indexing is much slower on fragmented assemblies #10

Closed

lh3 added the bug Something isn't working label Sep 22, 2022

lh3 closed this as completed Sep 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seg-fault during index formation #3

Seg-fault during index formation #3

rwhetten commented Sep 12, 2022

lh3 commented Sep 12, 2022

rwhetten commented Sep 12, 2022

lh3 commented Sep 12, 2022 •

edited

Loading

rwhetten commented Sep 12, 2022

lh3 commented Sep 22, 2022

Seg-fault during index formation #3

Seg-fault during index formation #3

Comments

rwhetten commented Sep 12, 2022

lh3 commented Sep 12, 2022

rwhetten commented Sep 12, 2022

lh3 commented Sep 12, 2022 • edited Loading

rwhetten commented Sep 12, 2022

lh3 commented Sep 22, 2022

lh3 commented Sep 12, 2022 •

edited

Loading