gpu_type in wlds no appropriate #414

Neu970 · 2024-08-07T09:58:21Z

Dear Team lilab-bcb/cumulus,

I wanted to bring to your attention that several of the WDLs protocols you offer, such as Cellbender/remove_background, are no longer functioning properly. Specifically, I encountered an issue where Google Cloud reports insufficient quota to run the protocol using the "nvidia-tesla-t4" GPU.

The error message I received is as follows:

"Task cellbender.run_cellbender_remove_background_gpu:0:20 failed. The job was stopped before the command finished. PAPI error code 9. Could not start instance custom-4-8192 due to insufficient quota. Cromwell retries exhausted, task failed. Backend info: Execution failed: allocating: selecting resources: selecting region and zone: no available zones: us-east1: 1 NVIDIA_T4_GPUS (0/0 available) quota too low, us-west1: 1 NVIDIA_T4_GPUS (0/0 available) quota too low, us-central1: 1 NVIDIA_T4_GPUS (0/0 available) quota too low."

Additionally, protocols like Starsolo require adjustments to memory and CPU values, as the default settings are no longer adequate.

A couple of months ago, these protocols worked without any issues, but they seem to have stopped functioning recently. I hope this information helps you investigate whether this is a temporary problem or something more persistent. I greatly appreciate your efforts in developing these WDLs, as their value is immense, but at the moment, they are not very useful due to these issues.

Thank you for your attention to this matter.

Best regards,
Rod

yihming · 2024-09-10T20:41:47Z

Hi @Neu970 ,

Thanks for reaching, and thank you for reporting this issue!

Your issue with GPU quota limit when using cellbender workflow seems related to the low availability of GPUs in the regions. But I'll test at my side if it is related to any issue at our side.

For STARsolo workflow, could you please give some example on the issue? I'm asking this because the workflow runs well in my analysis tasks with the default 32 vCPUs and 120GB memory. If your data require more computing resources, please notice that you can increase them by setting your own values for num_cpu and memory inputs. (Please look for these inputs in https://cumulus.readthedocs.io/en/stable/starsolo.html#workflow-inputs)

Hope it helps!

Sincerely,
Yiming

Neu970 · 2024-09-11T05:57:32Z

Hi Yiming,
Problem solved. Talking with the Terra team and following your indications I tried different combinations of memory and cpus (values for num_cpu and memory inputs) to make it work.
Regards
Thanks

Neu970 · 2024-09-11T05:58:03Z

Hi Yiming, First I resolve this issue changing the computing resources. But I have another issue and I hope your help. I am applying the starsolo wdl (Source: github.com/lilab-bcb/cumulus/STARsolo:master) on Terra platform. A couple of weeks ago I used the same protocol and the same files and now it doesn't work. The generate_count_config process works without problem but starsolo_count informs me in the Terra job manager: “Task starsolo_count.run_starsolo:NA:1 failed. Job exit code 1. Check gs: //fc-a17dc2ff-9944-43db-a126-8b87eeb0b279/submissions/5af1c8ff-ad3f-4d8e-bdc4-183410b1260f/starsolo_workflow/893360c8-347c-4b14-ba07- 9456bb9f67d3/call-starsolo_count/shard-0/starsolo_count/262fa48e-7704-412e-a993-bc5abf946d8d/call-run_starsolo/stderr for more information. PAPI error code 9. Please check the log file for more details: gs: //fc-a17dc2ff-9944-43db-a126-8b87eeb0b279/submissions/5af1c8ff-ad3f-4d8e-bdc4-183410b1260f/starsolo_workflow/893360c8-347c-4b14- ba07-9456bb9f67d3/call-starsolo_count/shard-0/starsolo_count/262fa48e-7704-412e-a993-bc5abf946d8d/call-run_starsolo/run_starsolo. log.” and in the run_starsolo.log it is as if strato cannot recognize the arguments: “Average throughput: 251.2MiB/s 2024/09/11 04:47:18 Localization script execution complete. 2024/09/11 04:47:58 Done localization. 2024/09/11 04:48:14 Running user action: docker run -v /mnt/local-disk:/cromwell_root --entrypoint=/bin/bash ***@***.***:f6f598545121fd36bba4df78979d821fdb4292804d48b8a7a31a54a1c95b819c /cromwell_root/script usage: strato cp [-h] [-r] [-m] [--ionice] [--profile PROFILE] [--quiet] [--dryrun] filenames [filenames …] strato cp: error: unrecognized arguments: --backend gs://fc-a17dc2ff-9944-43db-a126-8b87eeb0b279/SRR15931900/GSM5585219_1.fastq.gz GSM5585219_0/ strato exists --backend gcp gs://fc-a17dc2ff-9944-43db-a126-8b87eeb0b279/SRR15931900/GSM5585219/ strato cp --backend gcp -m gs://fc-a17dc2ff-9944-43db-a126-8b87eeb0b279/SRR15931900/GSM5585219_1.fastq.gz GSM5585219_0/ Traceback (most recent call last): File "<stdin>", line 29, in <module> File "/usr/lib/python3.9/subprocess.py", line 373, in check_call raise CalledProcessError(retcode, cmd) subprocess. CalledProcessError: Command '['strato', 'exists', '--backend', 'gcp', 'gs://fc-a17dc2ff-9944-43db-a126-8b87eeb0b279/SRR15931900/GSM5585219/']' returned non-zero exit status 2." if you think it should be indicated in the lilab-bcb/cumulus issues section, let me know and I will send it to you. again thank you very much for your assistance Best regards Neurod

…

El 10 sept 2024, a las 22:42, Yiming Yang ***@***.***> escribió: Hi @Neu970 <https://github.com/Neu970> , Thanks for reaching, and thank you for reporting this issue! Your issue with GPU quota limit when using cellbender workflow seems related to the low availability of GPUs in the regions. But I'll test at my side if it is related to any issue at our side. For STARsolo workflow, could you please give some example on the issue? I'm asking this because the workflow runs well in my analysis tasks with the default 32 vCPUs and 120GB memory. If your data require more computing resources, please notice that you can increase them by setting your values for num_cpu and memory inputs. (Please look for these inputs in https://cumulus.readthedocs.io/en/stable/starsolo.html#workflow-inputs) Hope it helps! Sincerely, Yiming — Reply to this email directly, view it on GitHub <#414 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AUNK2ZZUZVA7HP4KEH3YBITZV5KSDAVCNFSM6AAAAABMEBVW2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBRHE3TGNJYHE>. You are receiving this because you were mentioned.

yihming · 2024-09-12T16:55:38Z

Hi @Neu970 ,

This is a backward compatibility issue. I just fixed it in the master branch, which you use in your jobs. Please check it out now, and let me know if the issue still persists.

Sincerely,
Yiming

Neu970 · 2024-09-15T15:45:31Z

Dear Yiming, Thank you for your support and help; everything went smoothly. I have submitted various analyses with different samples and haven't encountered any errors. I truly appreciate your quick response and how efficiently you solved the problem. Best regards, Rodrigo

…

El 12 sept 2024, a las 18:56, Yiming Yang ***@***.***> escribió: Hi @Neu970 <https://github.com/Neu970> , This is a backward compatibility issue. I just fixed it in the master branch, which you use in your jobs. Please check it out now, and let me know if the issue still persists. Sincerely, Yiming — Reply to this email directly, view it on GitHub <#414 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AUNK2ZYSB4WXJX6AQSIDP6TZWHBSBAVCNFSM6AAAAABMEBVW2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBWHAYDANZXGE>. You are receiving this because you were mentioned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu_type in wlds no appropriate #414

gpu_type in wlds no appropriate #414

Neu970 commented Aug 7, 2024

yihming commented Sep 10, 2024 •

edited

Loading

Neu970 commented Sep 11, 2024

Neu970 commented Sep 11, 2024 via email

yihming commented Sep 12, 2024

Neu970 commented Sep 15, 2024 via email

gpu_type in wlds no appropriate #414

gpu_type in wlds no appropriate #414

Comments

Neu970 commented Aug 7, 2024

yihming commented Sep 10, 2024 • edited Loading

Neu970 commented Sep 11, 2024

Neu970 commented Sep 11, 2024 via email

yihming commented Sep 12, 2024

Neu970 commented Sep 15, 2024 via email

yihming commented Sep 10, 2024 •

edited

Loading