Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-region support google-batch #280

Open
rivershah opened this issue Dec 4, 2023 · 7 comments
Open

Multi-region support google-batch #280

rivershah opened this issue Dec 4, 2023 · 7 comments

Comments

@rivershah
Copy link

google-cls-v2 has powerful multi region support through wild card matching or providing lists of regions. It appears that google-batch is lacking this feature.

Multi-region support is a much loved and used feature with google-cls-v2. Can you please verify that indeed google-batch does not have this. And if not, would it be possible to work with google batch developers to introduce this feature by the time Cloud Life Sciences gets removed. Thank you.

@mbookman
Copy link
Contributor

mbookman commented Dec 4, 2023

Thanks for the input @rivershah.

A cursory look indicates that Batch supports this via the LocationPolicy:

https://cloud.google.com/batch/docs/reference/rest/v1alpha/projects.locations.jobs#locationpolicy

We'll test it out and look to wire it up if it works as expected.

@rivershah
Copy link
Author

rivershah commented Dec 4, 2023

@mbookman Thanks for looking. My understanding of the docs is that batch api will raise an error if multiple regions.

Only one region or multiple zones in one region is supported now

It appears the multi-region support may not have been ported over in batch. Await the results of your experimentation with this as I can't seem to get my jobs to schedule if I specify multiple regions as VM enter error state.

For context, why the multi-region support is so useful is that greatly simplifies job submission for hard to find resources such as high memory nodes and GPU accelerators. It is typical that a large parallel GPU dsub tsv submission will find resources across geographically widely separated regions

@anngregory
Copy link

Does dsub work with google batch?

@rivershah
Copy link
Author

@mbookman Happy new year! I am still pretty sure that batch as implemented on google's side, does not support submitting a job to us wide regions, which google-cls-v2 does allow. This would be a major feature regression. I don't think this a dsub limitation.

Can you please verify if I what I am saying is correct. If so, we will need help determine if this feature can be implemented in batch

@rivershah
Copy link
Author

@mbookman @wnojopra As the google-cls-v2 is headed for removal soon enough, requesting that we look at this feature regression. Thank you

@mbookman
Copy link
Contributor

mbookman commented Apr 8, 2024

Hey @rivershah !

Sorry about the delay in following up. We did check in with the Batch team regarding this. The lack of the multi-region support is presently intentional in the sense that it was not considered to have high utility. It would be great if we could get more input from you on your use case and where you see it giving value.

One of the key drivers of this feature not being added to Batch is the change in Cloud pricing in 2022 where accessing data from multi-region buckets to regional buckets became something that incurs Data Transfer Out charges (fka egress charges).

https://cloud.google.com/storage/pricing-announce#network

Reading data in a Cloud Storage bucket located in a multi-region from a Google Cloud service located in a region on the same continent will no longer be free; instead, such moves will be priced the same as general data moves between different locations on the same continent.

                        Northern America
Northern America	$0.02/GB

Prior to those pricing changes, access to data in US multi-region bucket to any of the US regions was free. So the Cloud view on this is that generally people will want to use regional buckets and regional VMs.

So can you share your use case where this pricing change has not impacted you and where you'd get high value from multiple regions?

Thanks.

@rivershah
Copy link
Author

rivershah commented Jul 30, 2024

Hi @mbookman,

Apologies for the delay. The multi-region feature is crucial for several reasons:

  • Hardware Flexibility: Users can't predict accelerator hardware and preemptible machine availability in advance. Multi-region support allows Google Batch layers to optimize and find suitable machines.
  • Artifact Registry: Multi-region artifact registries have multi region optimized pricing, making region flexibility beneficial.
  • Resource Availability: For machine learning, having access to GPUs across multiple regions is more valuable than saving on egress charges. This flexibility helps ensure that we can scale and schedule resources efficiently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants