-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Google Storage Requester pays feature to Golang SDK. #33236
base: master
Are you sure you want to change the base?
Conversation
Setting UserProject on Google Storage Bucket operations to enable requester pays feature. Requester pays project ID will come from environment variable named `BILLING_PROJECT_ID` More information about Google storage requester pays feature here https://cloud.google.com/storage/docs/requester-pays
Assigning reviewers. If you would like to opt out of this review, comment R: @lostluck for label go. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Reminder, please take a look at this pr: @lostluck @chamikaramj |
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment R: @jrmccluskey for label go. Available commands:
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #33236 +/- ##
============================================
- Coverage 58.92% 57.41% -1.51%
+ Complexity 3112 1475 -1637
============================================
Files 1133 970 -163
Lines 174889 154431 -20458
Branches 3343 1076 -2267
============================================
- Hits 103045 88661 -14384
+ Misses 68495 63569 -4926
+ Partials 3349 2201 -1148
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This largely looks good, and it does seem that the storage
package code handles an empty string there properly, despite not being in the documentation.
https://pkg.go.dev/cloud.google.com/go/storage#BucketHandle.UserProject
The main concern I have is whether the environment variable is automatically set in GCE VMs or not.
Have you tried this on Google Cloud Dataflow? (it's OK if you haven't, or your usage of Beam doesn't require it, but it's something I'll need to verify to ensure the feature works well...).
Once you get back to me on that, I'm happy to merge this in.
I think the only thing I don't really love is using an environment variables for this, as it's not necessarily going to be set or not, and there doesn't seem to be any standardization around it within Beam or GCP (outside of a few scattered unrelated examples). It would probably be better to do something as a configurable hook at Pipeline submission time, that is triggered when a worker is initialized. |
Hi @lostluck, thank you for your feedback. I will change the implementation to get the billing project from the pipeline variable instead of using the environment variable. |
If it were me, I'd add a function to the GCS package that sets the value when the fs is constructed. eg. Then do something like the harness opts do for the side input cache: Defines and installs the hook URN.
Has a method to be called at before pipeline submission to enable the hook, and call https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/util/harnessopts/cache.go This would affect the whole pipeline (just as this change does), but if a pipeline author wants to do trickier bits where the billing requester is changing, then we can figure out other options. |
Reminder, please take a look at this pr: @jrmccluskey @ahmedabu98 |
@lostluck check if i understand your suggestion: in file sdks/go/pkg/beam/io/filesystem/gcs/gcs.go i will register a hook to get billing project information:
and on beam/sdks/go/pkg/beam/util/harnessopts/gcs.go i will create a function like this?
I have some questions:
|
Setting UserProject on Google Storage Bucket operations to enable requester pays feature.
Requester pays project ID will come from environment variable named
BILLING_PROJECT_ID
More information about Google storage requester pays feature here https://cloud.google.com/storage/docs/requester-pays
fixes #30747