You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since October 2022, reading data from a bucket in us-multi is no longer free:
Reading data in a Cloud Storage bucket located in a multi-region from a Google Cloud service located in a region on the same continent will no longer be free; instead, such moves will be priced the same as general data moves between different locations on the same continent.
It seems like the cumulus workflows have not been updated after this change:
The zones argument includes us-central, us-east, and us-west by default.
The pipeline pulls resources from gs://regev-lab which is requester-pays and us-multi.
I just used the workflow for cellranger count, and the costs were 50% higher than necessary due to data transfer costs. An individual user can avoid transfer fees by mirroring the resources and setting the genome_file parameter to a URL. Perhaps the choice should be explicit by making zones a required parameter, and perhaps the resources could be made available from a bucket in us-central.
The text was updated successfully, but these errors were encountered:
As a general workflow, it's hard to make its zones in use always stick to the same as the Google bucket from which the input data are. I believe this mechanism should be considered by Cromwell, the workflow execution engine underline, and only at that level can such consistency be applied.
What I can do at our side is to create a dedicated section in our docs page to highlight this pricing issue, so that users can adjust by themselves.
The gs://regev-lab bucket is maintained by Broad Institute, and our team doesn't have management permission. I'll let them notice this pricing issue.
I don't know how you run your workflows via GCP, but just would like to share that by using GCP Batch, which will replace Google Life Sciences API in 2025/07, if you deploy it within one region, then you no longer need to specify zones in your workflow input, and your jobs would be only executed within that region.
Since October 2022, reading data from a bucket in us-multi is no longer free:
It seems like the cumulus workflows have not been updated after this change:
I just used the workflow for cellranger count, and the costs were 50% higher than necessary due to data transfer costs. An individual user can avoid transfer fees by mirroring the resources and setting the genome_file parameter to a URL. Perhaps the choice should be explicit by making zones a required parameter, and perhaps the resources could be made available from a bucket in us-central.
The text was updated successfully, but these errors were encountered: