Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(bigquery): allow insert to target dataset/table in another project #28097

Merged
merged 2 commits into from
Dec 13, 2024

Conversation

alvarowolfx
Copy link
Contributor

@alvarowolfx alvarowolfx commented Dec 11, 2024

Allow insertAll API to target tables in a dataset that lives in a separated project, other than the project usage on the main client and which is used for billing.

I haven't added integration tests, since we would not have write access to two different projects in our CI pipelines. But I tested locally with this given code:

$ export GOOGLE_CLOUD_PROJECT=projectA
$ bq mk --table test_dataset.ruby-test-001 name:STRING,value:NUMERIC
bigquery = Google::Cloud::Bigquery.new
project_id = "projectB"
dataset_id = "test_dataset"
table_id = "ruby-test-001"
dataset  = bigquery.dataset dataset_id, project_id: project_id
table    = dataset.table table_id

row_data = [
  { name: "Alice", value: 5  },
  { name: "Bob",   value: 10 }
]
response = table.insert row_data

if response.success?
  puts "Inserted rows successfully"
else
  puts "Failed to insert #{response.error_rows.count} rows"
end

inserter = table.insert_async do |result|
  if result.error?
    puts result.error
  else
    puts "inserted #{result.insert_count} rows with #{result.error_count} errors"
  end
end

inserter.insert row_data

inserter.stop.wait!

Follow up on #27681 and #27368

@alvarowolfx alvarowolfx requested a review from dazuma December 11, 2024 20:55
@alvarowolfx alvarowolfx added the api: bigquery Issues related to the BigQuery API. label Dec 11, 2024
@dazuma dazuma added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 11, 2024
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Dec 11, 2024
@@ -2810,7 +2810,7 @@ def insert_async table_id, skip_invalid: nil, ignore_unknown: nil, max_bytes: 10
ensure_service!

# Get table, don't use Dataset#table which handles NotFoundError
gapi = service.get_table dataset_id, table_id, metadata_view: view
gapi = service.get_project_table project_id, dataset_id, table_id, metadata_view: view
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this changes things. Under what circumstance would the dataset's project ID be different from the project ID of the service backing the dataset?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, it's now possible for the client (whose project is tied to the billing account) to get and construct a Dataset object from a different project. So this is actually a bug fix rather than a feature: in such a case, we're making sure the correct project gets set when doing an insert rows operation. Is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dazuma This is a very common scenario among BigQuery users, where operations are run in project A to modify data stored in project B. So you need to target tables in a dataset that lives in a separated project, other than the project usage on the main client and which for example is used for billing.

Another example: a central project for an org holds all the long term stored data, but each department isolates their operational costs from one another by using individual projects to run queries and load jobs, etc.

This is the same need that was mentioned on #27681 and #27368

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, it's now possible for the client (whose project is tied to the billing account) to get and construct a Dataset object from a different project. So this is actually a bug fix rather than a feature: in such a case, we're making sure the correct project gets set when doing an insert rows operation. Is that correct?

yeah, it might make sense to change it to a fix instead of feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is indeed a bug fix (and I think it should be: I would expect a new feature to include an addition to the public interface, and this PR doesn't make any public interface changes at all), then I'd like to see new unit tests that show the fixed case. You can mock out the backend service, as many of the unit tests do, but just test that the correct project ID gets passed down to the backend service.

@alvarowolfx alvarowolfx changed the title feat(bigquery): insert rows in another project fix(bigquery): allow insert rows to target dataset/table in another project Dec 12, 2024
@alvarowolfx alvarowolfx changed the title fix(bigquery): allow insert rows to target dataset/table in another project fix(bigquery): allow insert to target dataset/table in another project Dec 12, 2024
@alvarowolfx alvarowolfx requested a review from dazuma December 13, 2024 17:54
Copy link
Member

@dazuma dazuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks much!

@dazuma dazuma merged commit 0430f6c into main Dec 13, 2024
13 checks passed
@dazuma dazuma deleted the bq-feat-insert-project branch December 13, 2024 19:31
@github-actions github-actions bot added the release-please:force-run To run release-please label Dec 13, 2024
@release-please release-please bot removed the release-please:force-run To run release-please label Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants