Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu_type in wlds no appropriate #414

Open
Neu970 opened this issue Aug 7, 2024 · 5 comments
Open

gpu_type in wlds no appropriate #414

Neu970 opened this issue Aug 7, 2024 · 5 comments

Comments

@Neu970
Copy link

Neu970 commented Aug 7, 2024

Dear Team lilab-bcb/cumulus,

I wanted to bring to your attention that several of the WDLs protocols you offer, such as Cellbender/remove_background, are no longer functioning properly. Specifically, I encountered an issue where Google Cloud reports insufficient quota to run the protocol using the "nvidia-tesla-t4" GPU.

The error message I received is as follows:

"Task cellbender.run_cellbender_remove_background_gpu:0:20 failed. The job was stopped before the command finished. PAPI error code 9. Could not start instance custom-4-8192 due to insufficient quota. Cromwell retries exhausted, task failed. Backend info: Execution failed: allocating: selecting resources: selecting region and zone: no available zones: us-east1: 1 NVIDIA_T4_GPUS (0/0 available) quota too low, us-west1: 1 NVIDIA_T4_GPUS (0/0 available) quota too low, us-central1: 1 NVIDIA_T4_GPUS (0/0 available) quota too low."

Additionally, protocols like Starsolo require adjustments to memory and CPU values, as the default settings are no longer adequate.

A couple of months ago, these protocols worked without any issues, but they seem to have stopped functioning recently. I hope this information helps you investigate whether this is a temporary problem or something more persistent. I greatly appreciate your efforts in developing these WDLs, as their value is immense, but at the moment, they are not very useful due to these issues.

Thank you for your attention to this matter.

Best regards,
Rod

@yihming
Copy link
Member

yihming commented Sep 10, 2024

Hi @Neu970 ,

Thanks for reaching, and thank you for reporting this issue!

Your issue with GPU quota limit when using cellbender workflow seems related to the low availability of GPUs in the regions. But I'll test at my side if it is related to any issue at our side.

For STARsolo workflow, could you please give some example on the issue? I'm asking this because the workflow runs well in my analysis tasks with the default 32 vCPUs and 120GB memory. If your data require more computing resources, please notice that you can increase them by setting your own values for num_cpu and memory inputs. (Please look for these inputs in https://cumulus.readthedocs.io/en/stable/starsolo.html#workflow-inputs)

Hope it helps!

Sincerely,
Yiming

@Neu970
Copy link
Author

Neu970 commented Sep 11, 2024

Hi Yiming,
Problem solved. Talking with the Terra team and following your indications I tried different combinations of memory and cpus (values for num_cpu and memory inputs) to make it work.
Regards
Thanks

@Neu970
Copy link
Author

Neu970 commented Sep 11, 2024 via email

@yihming
Copy link
Member

yihming commented Sep 12, 2024

Hi @Neu970 ,

This is a backward compatibility issue. I just fixed it in the master branch, which you use in your jobs. Please check it out now, and let me know if the issue still persists.

Sincerely,
Yiming

@Neu970
Copy link
Author

Neu970 commented Sep 15, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants