Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move away from multi-regional #410

Open
bgorissen opened this issue Jul 8, 2024 · 1 comment
Open

move away from multi-regional #410

bgorissen opened this issue Jul 8, 2024 · 1 comment

Comments

@bgorissen
Copy link

Since October 2022, reading data from a bucket in us-multi is no longer free:

Reading data in a Cloud Storage bucket located in a multi-region from a Google Cloud service located in a region on the same continent will no longer be free; instead, such moves will be priced the same as general data moves between different locations on the same continent.

It seems like the cumulus workflows have not been updated after this change:

  1. The zones argument includes us-central, us-east, and us-west by default.
  2. The pipeline pulls resources from gs://regev-lab which is requester-pays and us-multi.

I just used the workflow for cellranger count, and the costs were 50% higher than necessary due to data transfer costs. An individual user can avoid transfer fees by mirroring the resources and setting the genome_file parameter to a URL. Perhaps the choice should be explicit by making zones a required parameter, and perhaps the resources could be made available from a bucket in us-central.

@yihming
Copy link
Member

yihming commented Sep 10, 2024

Hi @bgorissen ,

Thanks a lot for reporting this price issue!

As a general workflow, it's hard to make its zones in use always stick to the same as the Google bucket from which the input data are. I believe this mechanism should be considered by Cromwell, the workflow execution engine underline, and only at that level can such consistency be applied.

What I can do at our side is to create a dedicated section in our docs page to highlight this pricing issue, so that users can adjust by themselves.

The gs://regev-lab bucket is maintained by Broad Institute, and our team doesn't have management permission. I'll let them notice this pricing issue.

I don't know how you run your workflows via GCP, but just would like to share that by using GCP Batch, which will replace Google Life Sciences API in 2025/07, if you deploy it within one region, then you no longer need to specify zones in your workflow input, and your jobs would be only executed within that region.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants