Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow speed making Casanovo impractical for most shotgun data. #251

Closed
mhoopmann opened this issue Oct 13, 2023 · 6 comments · Fixed by #256
Closed

Slow speed making Casanovo impractical for most shotgun data. #251

mhoopmann opened this issue Oct 13, 2023 · 6 comments · Fixed by #256
Labels
question Further information is requested

Comments

@mhoopmann
Copy link

Casanovo is working great for amino acid sequence identification, but it's going really slowly in my hands. I do have an NVIDIA card, nothing special (Quadro P400), but it takes days to complete the analysis for the number of spectra I can acquire in a single hour. At this rate, I can't keep pace with how fast I am collecting spectra. Is there something I'm doing wrong? Or perhaps a way to improve the algorithm speed? Casanovo performance seems way to slow for practical use, and I really, really want to use this software.

@bittremieux
Copy link
Collaborator

Running Casanovo on CPU-only is very slow unfortunately, and if there's a mismatch in the CUDA version Casanovo might inadvertently fall back to that. You have an older GPU, so we first need to ensure that it's actually being used.

Can you share the Casanovo log file? That contains some information on whether a GPU was found. Additionally, can you check what the output of watch nvidia-smi is while running Casanovo and what its GPU resource consumption is?

@bittremieux bittremieux added the question Further information is requested label Oct 13, 2023
@mhoopmann
Copy link
Author

Thanks for the quick response, Wout! Here's the log file: https://regis-web.systemsbiology.net/PublicDatasets/mikeh/Casanovo/20210901-HeLa-01.log

Yes, the nvidia-smi output looks very lackluster while running Casanovo (see image below). Any suggestions? Or perhaps some guidelines for minimum hardware requirements? Thanks much!
image

@bittremieux
Copy link
Collaborator

Indeed, it doesn't seem like the Casanovo process is running or even registered on the GPU. I suspect that this is because your GPU is a bit older, but the log file unfortunately is not conclusive. We also don't have a similar GPU to test compatibility.

@mhoopmann
Copy link
Author

mhoopmann commented Oct 17, 2023

Thanks for putting me in the right direction. I'm making progress. For anyone else who might have the same issues, here are the steps:

  1. remove pytorch
  2. update nvidia drivers to the latest
  3. reinstall pytorch using the appropriate command at https://pytorch.org/get-started/locally/

They key for me was actually the removal of pytorch rather than trying to update any of the packages in place. Updating the OS drivers definitely required the removal and reinstallation of pytorch/cuda on my system.

A simple test to see if GPU compute is working is the following python code:

import torch
a=torch.cuda.is_available()
print(f'CUDA: {a}')

The results: Casanovo is going along faster. nvidia-smi indicates 100% usage, and the temperature is rising. I'll have to find a new GPU to really try to open the throttle. I am still concerned with the following warning:

".conda\envs\casanovo_env\lib\site-packages\pytorch_lightning\trainer\connectors\data_connector.py:224: PossibleUserWarning: The dataloader, predict_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 12 which is the number of cpus on this machine) in theDataLoader` init to improve performance."

If I'm guessing correctly, n_workers is used to set num_workers. It is set to 0, as far as I can tell from the debug log:
"2023-10-17 12:24:41,189 DEBUG [casanovo/MainProcess] casanovo.main : n_workers = 0"

But the code for dataloaders.py(line number 74) indicates that the value should be 12:

self.n_workers = n_workers if n_workers is not None else os.cpu_count()

Am I interpreting this correctly? Or should I not even be concerned? Is there a way to set this parameter without changing the code?

@bittremieux
Copy link
Collaborator

The number of workers is platform-dependent. Specifically, on Windows, only a single worker thread can be used for data loading. This is maybe slightly sub-optimal, but shouldn't make that much difference in the end.

Great that you managed to get the GPU working. What's the spectrum throughput on that system? When a GPU is available, this might no longer be the bottleneck, even with an older GPU. Other parts of the Casanovo code can be pretty slow, and this is an active action point for us.

@mhoopmann
Copy link
Author

mhoopmann commented Oct 18, 2023

Thanks Wout, good to know.
I went big for my latest test: 114,000 spectra from a single run. It's still running (been about 27 hours). I suspect it will finish in 36 hours. This is a huge improvement; without the GPU, it would be 10 days. 10 days was not practical, but 36 hours I can work with. A new GPU or two, or renting some cloud GPUs, should allow me to scale up if I can get the budget. Any other speed improvements to the algorithm would be a huge bonus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants