-
Notifications
You must be signed in to change notification settings - Fork 909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Airflow AWS MWAA deployment docs #3860
Conversation
Signed-off-by: Dmitry Sorokin <[email protected]>
Signed-off-by: Dmitry Sorokin <[email protected]>
Signed-off-by: Dmitry Sorokin <[email protected]>
Also reminder to add quotes to the dataset factory names in the docs "" |
client_kwargs: | ||
aws_access_key_id: ********************* | ||
aws_secret_access_key: ****************************************** | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tried this yet, but do the credentials in the local/credentials.yml
work on the cloud platform? This file ideally shouldn't be uploaded to the cloud at all. Maybe setting up env variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that approach with the local
environment, and it works. We can change the environment from local
to prod
, but am I correct in understanding that there isn't a big difference?
For setting environment variables, I am using an env_var.sh
file that is located in the same S3 bucket as the conf
folder, which is currently packed into plugins.zip
. Therefore, I don't think it makes much sense to use environment variables for security reasons because, ultimately, they will be stored in the same S3 bucket. However, this S3 bucket is secure and does not have public access, so I think it's not a problem.
docs/source/deployment/airflow.md
Outdated
s3fs | ||
``` | ||
|
||
6. Archive your `conf` folder into a `conf.zip` file and upload it to `s3://your_S3_bucket` for later use in the Airflow container. This file will be unzipped into the `plugins` folder within the MWAA Airflow container. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth noting that the kedro package
command will create an archived conf-<package-name>.tar.gz
file in the dist/
folder without the local conf which should be used here ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any suggestions on how this file should be used? Should we simply unpack it to extract the conf
folder, or should we run kedro run
directly with the conf
archive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't need to be unpacked at all, OmegaConfigLoader
can handle .tar.gz
and .zip
confs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed that but I don't know how to be with export KEDRO_LOGGING_CONFIG="plugins/logging.yml"
, I currently copy logging.yml
separetely, but maybe you know better solution.
Co-authored-by: Ankita Katiyar <[email protected]> Signed-off-by: Dmitry Sorokin <[email protected]>
Signed-off-by: Dmitry Sorokin <[email protected]>
Signed-off-by: Dmitry Sorokin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good 👍
Maybe it's helpful to add links to the different "how to" sections in the introduction of the page, so users won't have to scroll to see what all the different options are.
docs/source/deployment/airflow.md
Outdated
``` | ||
|
||
### Deployment on AWAA | ||
1. Archive your three files: `new_kedro_project-0.1-py3-none-any.wh`l and `conf-new_kedro_project.tar.gz` located in `new-kedro-project/dist`, and `logging.yml` located in `new-kedro-project/conf/` into a file called `plugins.zip` and upload it to `s3://your_S3_bucket`. This archive will be later unpacked to the `/plugins` folder in the working directory of the Airflow container. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should provide the command/instructions on how to archive, because not all users might know how to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I've added it!
docs/source/deployment/airflow.md
Outdated
|
||
### Deployment on AWAA | ||
1. Archive your three files: `new_kedro_project-0.1-py3-none-any.wh`l and `conf-new_kedro_project.tar.gz` located in `new-kedro-project/dist`, and `logging.yml` located in `new-kedro-project/conf/` into a file called `plugins.zip` and upload it to `s3://your_S3_bucket`. This archive will be later unpacked to the `/plugins` folder in the working directory of the Airflow container. | ||
2. Create a new `requirements.txt` file, add the command to install your Kedro project archived in the previous step, and upload it to `s3://your_S3_bucket`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure I understand this. What do you mean with "add the command to install your Kedro project archived in the previous step"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I rephrased that.
Co-authored-by: Merel Theisen <[email protected]> Signed-off-by: Dmitry Sorokin <[email protected]>
Signed-off-by: Dmitry Sorokin <[email protected]>
Signed-off-by: Dmitry Sorokin <[email protected]>
Thank you for the review, @merelcht . I agree and have added an Introduction section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added introduction is great and makes the guide easier to follow! 👍
Co-authored-by: Merel Theisen <[email protected]> Signed-off-by: Dmitry Sorokin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @DimedS!
Signed-off-by: Dmitry Sorokin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this detailed guide @DimedS ⭐
The prose is necessarily long because there are several steps involved. One thing I'd consider for the future is that, rather than directing users up and down in the document ("Complete steps 1-4 from..."), it would be good to produce self-contained documents, even if they are redundant. Some pre-requisites for that:
- Understand if there's something we can improve in Kedro to actually simplify the process Deployment support/Kedro deployer #2058 (comment)
- Improve the usability of the documentation https://github.com/kedro-org/kedro-viz/issues/1474 (it's already difficult to navigate these long pages)
- If needed, use Sphinx includes to reuse content and avoid duplication
Other than that, I haven't tested these steps but I'm thrilled to see we have more detailed docs now 💯
Description
This PR introduces a section on deploying to Airflow AWS MWAA.
Developer Certificate of Origin
We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a
Signed-off-by
line in the commit message. See our wiki for guidance.If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.
Checklist
RELEASE.md
file