-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduled ingestion of CaDeT assets #123
Comments
We think yes, github actions is the way to go here. It will be a proof of concept for other ingestion types, and it will enable us to add custom transformers if we need to (e.g. to assign domains in ministryofjustice/find-moj-data#343) |
TODO:
|
To allow github actions to access this bucket, we need
An OIDC assumable role looks something like this:
If we want to restrict access it to builds of the main branch, then we can use |
Policy datahubReadCaDeTBucket is already defined in the analytical-platform repo |
We want to schedule Datahub ingestions using github actions. (ministryofjustice/data-catalogue#123) To do this, Github actions needs to be able to assume a role via OIDC. See https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-idp_oidc.html This role needs read only access to the bucket that contains CaDeT outputs.
We want to schedule Datahub ingestions using github actions. (ministryofjustice/data-catalogue#123) To do this, Github actions needs to be able to assume a role via OIDC. See https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-idp_oidc.html This role needs read only access to the bucket that contains CaDeT outputs.
We want to schedule Datahub DBT ingestions using github actions. (ministryofjustice/data-catalogue#123) To do this, Github actions needs to be able to assume a role via OIDC, and use it to access the s3 bucket containing the outputs from DBT. See https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-idp_oidc.html We already had IRSAs (IAM roles for service accounts) which can be assumed by Datahub itself, but these assume you are running an application in a kubernetes pod on AWS, whereas in this case we are going to run the ingestion from github actions.
We want to schedule Datahub DBT ingestions using github actions. (ministryofjustice/data-catalogue#123) To do this, Github actions needs to be able to assume a role via OIDC, and use it to access the s3 bucket containing the outputs from DBT. See https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-idp_oidc.html We already had IRSAs (IAM roles for service accounts) which can be assumed by Datahub itself, but these assume you are running an application in a kubernetes pod on AWS, whereas in this case we are going to run the ingestion from github actions.
We want to schedule Datahub DBT ingestions using github actions. (ministryofjustice/data-catalogue#123) To do this, Github actions needs to be able to assume a role via OIDC, and use it to access the s3 bucket containing the outputs from DBT. See https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-idp_oidc.html We already had IRSAs (IAM roles for service accounts) which can be assumed by Datahub itself, but these assume you are running an application in a kubernetes pod on AWS, whereas in this case we are going to run the ingestion from github actions.
We want to schedule Datahub DBT ingestions using github actions. (ministryofjustice/data-catalogue#123) To do this, Github actions needs to be able to assume a role via OIDC, and use it to access the s3 bucket containing the outputs from DBT. See https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-idp_oidc.html We already had IRSAs (IAM roles for service accounts) which can be assumed by Datahub itself, but these assume you are running an application in a kubernetes pod on AWS, whereas in this case we are going to run the ingestion from github actions.
We want to schedule Datahub DBT ingestions using github actions. (ministryofjustice/data-catalogue#123) To do this, Github actions needs to be able to assume a role via OIDC, and use it to access the s3 bucket containing the outputs from DBT. See https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-idp_oidc.html We already had IRSAs (IAM roles for service accounts) which can be assumed by Datahub itself, but these assume you are running an application in a kubernetes pod on AWS, whereas in this case we are going to run the ingestion from github actions.
Access for data-catalogue github actions We want to schedule Datahub DBT ingestions using github actions. (ministryofjustice/data-catalogue#123) To do this, Github actions needs to be able to assume a role via OIDC, and use it to access the s3 bucket containing the outputs from DBT. See https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-idp_oidc.html We already had IRSAs (IAM roles for service accounts) which can be assumed by Datahub itself, but these assume you are running an application in a kubernetes pod on AWS, whereas in this case we are going to run the ingestion from github actions.
Where and how should this ingestion be scheduled
DataHub actions API?
Should the ingestion recipes live in GitHub?
Do it live
The text was updated successfully, but these errors were encountered: