Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Airflow AWS MWAA deployment docs #3860

Merged
merged 15 commits into from
May 20, 2024

Conversation

DimedS
Copy link
Contributor

@DimedS DimedS commented May 8, 2024

Description

This PR introduces a section on deploying to Airflow AWS MWAA.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

DimedS added 2 commits May 8, 2024 17:01
Signed-off-by: Dmitry Sorokin <[email protected]>
Signed-off-by: Dmitry Sorokin <[email protected]>
@ankatiyar
Copy link
Contributor

Also reminder to add quotes to the dataset factory names in the docs ""

docs/source/deployment/airflow.md Outdated Show resolved Hide resolved
client_kwargs:
aws_access_key_id: *********************
aws_secret_access_key: ******************************************
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tried this yet, but do the credentials in the local/credentials.yml work on the cloud platform? This file ideally shouldn't be uploaded to the cloud at all. Maybe setting up env variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that approach with the local environment, and it works. We can change the environment from local to prod, but am I correct in understanding that there isn't a big difference?

For setting environment variables, I am using an env_var.sh file that is located in the same S3 bucket as the conf folder, which is currently packed into plugins.zip. Therefore, I don't think it makes much sense to use environment variables for security reasons because, ultimately, they will be stored in the same S3 bucket. However, this S3 bucket is secure and does not have public access, so I think it's not a problem.

s3fs
```

6. Archive your `conf` folder into a `conf.zip` file and upload it to `s3://your_S3_bucket` for later use in the Airflow container. This file will be unzipped into the `plugins` folder within the MWAA Airflow container.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting that the kedro package command will create an archived conf-<package-name>.tar.gz file in the dist/ folder without the local conf which should be used here ^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any suggestions on how this file should be used? Should we simply unpack it to extract the conf folder, or should we run kedro run directly with the conf archive?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't need to be unpacked at all, OmegaConfigLoader can handle .tar.gz and .zip confs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed that but I don't know how to be with export KEDRO_LOGGING_CONFIG="plugins/logging.yml", I currently copy logging.yml separetely, but maybe you know better solution.

docs/source/deployment/airflow.md Outdated Show resolved Hide resolved
@DimedS DimedS marked this pull request as ready for review May 15, 2024 15:59
Signed-off-by: Dmitry Sorokin <[email protected]>
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good 👍

Maybe it's helpful to add links to the different "how to" sections in the introduction of the page, so users won't have to scroll to see what all the different options are.

docs/source/deployment/airflow.md Outdated Show resolved Hide resolved
docs/source/deployment/airflow.md Outdated Show resolved Hide resolved
docs/source/deployment/airflow.md Outdated Show resolved Hide resolved
docs/source/deployment/airflow.md Outdated Show resolved Hide resolved
```

### Deployment on AWAA
1. Archive your three files: `new_kedro_project-0.1-py3-none-any.wh`l and `conf-new_kedro_project.tar.gz` located in `new-kedro-project/dist`, and `logging.yml` located in `new-kedro-project/conf/` into a file called `plugins.zip` and upload it to `s3://your_S3_bucket`. This archive will be later unpacked to the `/plugins` folder in the working directory of the Airflow container.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should provide the command/instructions on how to archive, because not all users might know how to do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I've added it!


### Deployment on AWAA
1. Archive your three files: `new_kedro_project-0.1-py3-none-any.wh`l and `conf-new_kedro_project.tar.gz` located in `new-kedro-project/dist`, and `logging.yml` located in `new-kedro-project/conf/` into a file called `plugins.zip` and upload it to `s3://your_S3_bucket`. This archive will be later unpacked to the `/plugins` folder in the working directory of the Airflow container.
2. Create a new `requirements.txt` file, add the command to install your Kedro project archived in the previous step, and upload it to `s3://your_S3_bucket`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure I understand this. What do you mean with "add the command to install your Kedro project archived in the previous step"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I rephrased that.

DimedS and others added 3 commits May 17, 2024 10:54
Co-authored-by: Merel Theisen <[email protected]>
Signed-off-by: Dmitry Sorokin <[email protected]>
Signed-off-by: Dmitry Sorokin <[email protected]>
Signed-off-by: Dmitry Sorokin <[email protected]>
@DimedS
Copy link
Contributor Author

DimedS commented May 17, 2024

Generally looks good 👍

Maybe it's helpful to add links to the different "how to" sections in the introduction of the page, so users won't have to scroll to see what all the different options are.

Thank you for the review, @merelcht . I agree and have added an Introduction section.

@DimedS DimedS requested review from ankatiyar and merelcht May 17, 2024 10:44
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added introduction is great and makes the guide easier to follow! 👍

docs/source/deployment/airflow.md Outdated Show resolved Hide resolved
Copy link
Contributor

@ankatiyar ankatiyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @DimedS!

@DimedS DimedS merged commit 56961af into main May 20, 2024
10 checks passed
@DimedS DimedS deleted the kedro-airflow-docs-deployment-cloud-providers branch May 20, 2024 16:40
Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this detailed guide @DimedS

The prose is necessarily long because there are several steps involved. One thing I'd consider for the future is that, rather than directing users up and down in the document ("Complete steps 1-4 from..."), it would be good to produce self-contained documents, even if they are redundant. Some pre-requisites for that:

  1. Understand if there's something we can improve in Kedro to actually simplify the process Deployment support/Kedro deployer #2058 (comment)
  2. Improve the usability of the documentation https://github.com/kedro-org/kedro-viz/issues/1474 (it's already difficult to navigate these long pages)
  3. If needed, use Sphinx includes to reuse content and avoid duplication

Other than that, I haven't tested these steps but I'm thrilled to see we have more detailed docs now 💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kedro-airflow: Testing and Docs for kedro-airflow deployment on Major Cloud providers
4 participants