Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taking very long to load GFA1 from file #23

Closed
fawaz-dabbaghieh opened this issue Nov 22, 2021 · 4 comments
Closed

Taking very long to load GFA1 from file #23

fawaz-dabbaghieh opened this issue Nov 22, 2021 · 4 comments

Comments

@fawaz-dabbaghieh
Copy link

I was trying to load a GFA1 from a file with gfapy but I had to kill the process because it is taking over 15 minutes and not finishing. I am not sure what could be wrong.

The GFA is a de Bruijn graph and is the output of convertToGFA.py , where this script converts the contigs from bcalm2 output to a valid GFA1 file.
This graph has 944785 nodes and 2419232 edges.

Minimal example here:

import gfapy
import time

input_file = "sk1_y12_yeast_k43.gfa"

start = time.perf_counter()
graph = gfapy.Gfa.from_file(input_file)
print(f"it took {time.perf_counter() - start} seconds to load the file")

Is it supposed to take this long?

@ggonnella
Copy link
Owner

Difficult to tell, it depend on the graph and on the system, but indeed the graph is relatively long and currently Gfapy is entirely written in Python, so it has its limits...

Maybe you could try to set vlevel=0 in the from_file call? This disables validations, but should then be faster.

@fawaz-dabbaghieh
Copy link
Author

I see! Thank you for the very quick response!

@ggonnella
Copy link
Owner

Alternatively, you could consider using my library (not yet published, but publicly available) textformats which also has a Python interface and a GFA1 specification (file https://github.com/ggonnella/textformats/spec/gfa/gfa1.yaml). It is written in Nim and is much faster for large files.

@ggonnella
Copy link
Owner

However, it does not offer all operations on the graph, that gfapy offers, since it is generical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants