Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingest): Add config option to set Bigquery credential in source config #3786

Merged
merged 5 commits into from
Dec 27, 2021

Conversation

treff7es
Copy link
Contributor

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable)

@github-actions
Copy link

github-actions bot commented Dec 22, 2021

Unit Test Results

     45 files  ±0       45 suites  ±0   44m 3s ⏱️ -1s
   647 tests +3     595 ✔️ +3  52 💤 ±0  0 ±0 
1 453 runs  +9  1 385 ✔️ +9  68 💤 ±0  0 ±0 

Results for commit e1e80fe. ± Comparison against base commit 1a65ea7.

♻️ This comment has been updated with latest results.

"project_id": "project-id-1234567",
"private_key_id": "d0121d0000882411234e11166c6aaa23ed5d74e0",
"private_key": "-----BEGIN PRIVATE KEY-----\nMIIyourkey\n-----END PRIVATE KEY-----",
"client_email": "[email protected]",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing!!!


1. Setup a ServiceAccount (https://cloud.google.com/iam/docs/creating-managing-service-accounts#iam-service-accounts-create-console)
and assign the previously created role to this service account
2. Download a service account JSON keyfile:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 ":" in this line :)

"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/test%suppproject-id-1234567.iam.gserviceaccount.com"
}
```
3. Here you have two options:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rephrase as:

To provide credentials to the source, you can either

  1. Set an environment variable
  2. Provide credential fields in the source recipe


*or*

Set credential config in your source based on the credential json file like this:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set credential config in your source based on the credential json file. For example:


Set credential config in your source based on the credential json file like this:
```yml
credential:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if this should also be in the example recipe :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, I will add it.

logger.debug(
f"Creating temporary credential file at {self.credentials_path}"
)
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = self.credentials_path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this strictly required? It is strange we need to set an env variable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using various APIs (sqlalchemy, BigQueryClient (https://cloud.google.com/bigquery/docs/reference/libraries#setting_up_authentication) and GCPLoggingClient (https://cloud.google.com/logging/docs/reference/libraries#setting_up_authentication), and with this way, we can authenticate with all of these.
I think it is simple and safe as this environment variable only lives during the life of the python app but works for all the APIs.


# We can't use close as it is not called if the ingestion is not successful
def __del__(self):
if self.config.credentials_path:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!


with open(config.credentials_path) as jsonFile:
json_credential = json.load(jsonFile)
jsonFile.close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last q: Did you try setting the credentials using the config method manually?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I tried it out a couple of times on my computer.

server: "http://localhost:8080"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link
Collaborator

@jjoyce0510 jjoyce0510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you so much @treff7es !

Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shirshanka shirshanka merged commit 5df5150 into datahub-project:master Dec 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants