-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seg-fault during index formation #3
Comments
Please try the latest version from github HEAD. There was a bug, though I am not sure if that would lead to segfault. |
I used git pull, make clean, and make; then tried the index building job again. It ran for longer this time, and wrote the following to stderr: |
One potential cause is memory. The Ensembl version of GRCh38 has many ambiguous bases. Although the total contig length is 63 Gb, there are only ~3.2 Gb actual sequences. Your assembly is 7 times larger. I guess it will take 120-150 GB of memory for indexing. |
The node that was running the job had 370 Gb RAM allocated, and the output doesn't indicate an out-of-memory error in any way I recognize. The exit code was 139, and RAM use peaked at 100.5 Gb. Would non-alphanumeric, non-underscore characters (such as space or dot) in scaffold names be a problem? |
The segmentation fault should be caused by #4, which has been fixed. Let me know if you still have the problem. I am closing this issue for now. |
I cloned the repo and compiled the code, but I get a segmentation fault when trying to index a fragmented genome with 1.75 million scaffolds. The executable works fine to make an index of GRCh38 (including all alternate scaffolds, so 63 Gb total), so it doesn't appear to be the software itself.
Is there a limit on the number of scaffolds in an assembly for indexing? Alternatively, are there characters that might cause problems if present in scaffold names?
The text was updated successfully, but these errors were encountered: