Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Set quota project in beam.io.ReadFromBigQuery #31126

Open
6 of 16 tasks
shahar1 opened this issue Apr 28, 2024 · 9 comments
Open
6 of 16 tasks

[Feature Request]: Set quota project in beam.io.ReadFromBigQuery #31126

shahar1 opened this issue Apr 28, 2024 · 9 comments

Comments

@shahar1
Copy link
Contributor

shahar1 commented Apr 28, 2024

What would you like to happen?

This issue relates to the Python SDK, but it is probably relevant to other SDKs as well:
We have a use case where queries initiated by beam.io.ReadFromBigQuery should be billed on a specific GCP Project ID.
As we use a custom container, the only option for now would be setting the env. var. GOOGLE_CLOUD_QUOTA_PROJECT in the Dockerfile - but it affects all other GCP services as well.
It would be best making it configurable via the connector (i.e., beam.io.ReadFromBigQuery(..., quota_project_id='some-project-id)).
When implementing, you could gain inspiration from similar feature in beam.io.WriteToBigQuery: #16186.

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@brucearctor
Copy link
Contributor

@shahar1 this level of customization might make sense.

Let's explore your specific concern for a moment, I might have others, but imagine worth understanding your needs/usecase:

What Quotas are getting hit that are problematic? Or, what are the specific billing charges you are looking to attribute elsewhere?
Your are running on Dataflow? Or other? [ not critical, but curious ]
What read method? [ BQ Storage Read API? ]
You want to run the compute in one GCP project, but use BQ from another? If this unloads, writes to GCS and then into Dataflow [ that is another way that can occur ], do you intend to specify which project [ bucket within ] that the data is written?

@brucearctor
Copy link
Contributor

Also, I wonder whether implimentation of this issue would help with #30747

@shahar1
Copy link
Contributor Author

shahar1 commented Apr 30, 2024

@shahar1 this level of customization might make sense.

Let's explore your specific concern for a moment, I might have others, but imagine worth understanding your needs/usecase:

What Quotas are getting hit that are problematic? Or, what are the specific billing charges you are looking to attribute elsewhere? Your are running on Dataflow? Or other? [ not critical, but curious ] What read method? [ BQ Storage Read API? ] You want to run the compute in one GCP project, but use BQ from another? If this unloads, writes to GCS and then into Dataflow [ that is another way that can occur ], do you intend to specify which project [ bucket within ] that the data is written?

Thank for your response! Here are the answers for your questions:

  1. I'd like to attribute the queries execution to another project. In my case, the BigQuery is on project A, and beam.io.ReadFromBigQuery runs on project B - I'd like to bill project B for the queries (for that matter it could also be project C).
  2. We use Dataflow and direct runner (when implementing, it should better a be a general solution and not Dataflow specific).
  3. I use both methods - if I'm not wrong, in both cases you could set the quota_project_id via ClientOptions.
  4. Yup, you got the idea correctly :)

As for #30747 - it is related, but there might be some changes in implementation as GCS is project's resource rather than a service.

@brucearctor
Copy link
Contributor

@shahar1 sounds like you've got a decent idea/design in mind, which could be supported.

Are you interested in contributing? Feel free to start and include me on PRs, if that's the case.

@shahar1
Copy link
Contributor Author

shahar1 commented Apr 30, 2024

@shahar1 sounds like you've got a decent idea/design in mind, which could be supported.

Are you interested in contributing? Feel free to start and include me on PRs, if that's the case.

I'd be happy to try!
I need first to learn how development works here (I'm coming from the Airflow community)

@shahar1
Copy link
Contributor Author

shahar1 commented Apr 30, 2024

.take-issue

@brucearctor
Copy link
Contributor

This should be pretty good --> https://github.com/apache/beam/blob/master/CONTRIBUTING.md

If you find a problem [ or that is outdated ], let's overcome and fix the docs along the way.

@Abacn Abacn removed the typescript label May 10, 2024
@eminik
Copy link

eminik commented Sep 12, 2024

Are there any updates on this?

@shahar1
Copy link
Contributor Author

shahar1 commented Sep 12, 2024

Are there any updates on this?

I haven't managed to work on it yet.
If you (or anyone else) find this feature useful and want to take over, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants